Skip to main content

Showing 1–50 of 483 results for author: Siddhartha

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19314  [pdf, other

    cs.CL cs.AI cs.LG

    LiveBench: A Challenging, Contamination-Free LLM Benchmark

    Authors: Colin White, Samuel Dooley, Manley Roberts, Arka Pal, Ben Feuer, Siddhartha Jain, Ravid Shwartz-Ziv, Neel Jain, Khalid Saifullah, Siddartha Naidu, Chinmay Hegde, Yann LeCun, Tom Goldstein, Willie Neiswanger, Micah Goldblum

    Abstract: Test set contamination, wherein test data from a benchmark ends up in a newer model's training set, is a well-documented obstacle for fair LLM evaluation and can quickly render benchmarks obsolete. To mitigate this, many recent benchmarks crowdsource new prompts and evaluations from human or LLM judges; however, these can introduce significant biases, and break down when scoring hard questions. In… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.08775  [pdf, other

    cs.CV

    ALINA: Advanced Line Identification and Notation Algorithm

    Authors: Mohammed Abdul Hafeez Khan, Parth Ganeriwala, Siddhartha Bhattacharyya, Natasha Neogi, Raja Muthalagu

    Abstract: Labels are the cornerstone of supervised machine learning algorithms. Most visual recognition methods are fully supervised, using bounding boxes or pixel-wise segmentations for object localization. Traditional labeling methods, such as crowd-sourcing, are prohibitive due to cost, data privacy, amount of time, and potential errors on large datasets. To address these issues, we propose a novel annot… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Paper has been accepted to The 3rd CVPR Workshop on Vision Datasets Understanding, 2024

  3. arXiv:2406.06461  [pdf, other

    cs.CL

    Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies

    Authors: Junlin Wang, Siddhartha Jain, Dejiao Zhang, Baishakhi Ray, Varun Kumar, Ben Athiwaratkun

    Abstract: A diverse array of reasoning strategies has been proposed to elicit the capabilities of large language models. However, in this paper, we point out that traditional evaluations which focus solely on performance metrics miss a key factor: the increased effectiveness due to additional compute. By overlooking this aspect, a skewed view of strategy efficiency is often presented. This paper introduces… ▽ More

    Submitted 14 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  4. arXiv:2406.02402  [pdf, other

    math.OC cs.GT stat.ML

    Online Fair Allocation of Perishable Resources

    Authors: Siddhartha Banerjee, Chamsi Hssaine, Sean R. Sinclair

    Abstract: We consider a practically motivated variant of the canonical online fair allocation problem: a decision-maker has a budget of perishable resources to allocate over a fixed number of rounds. Each round sees a random number of arrivals, and the decision-maker must commit to an allocation for these individuals before moving on to the next round. The goal is to construct a sequence of allocations that… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 51 pages, 8 figures

    MSC Class: 91B32

  5. arXiv:2405.19307  [pdf, other

    cs.RO

    Data Efficient Behavior Cloning for Fine Manipulation via Continuity-based Corrective Labels

    Authors: Abhay Deshpande, Liyiming Ke, Quinn Pfeifer, Abhishek Gupta, Siddhartha S. Srinivasa

    Abstract: We consider imitation learning with access only to expert demonstrations, whose real-world application is often limited by covariate shift due to compounding errors during execution. We investigate the effectiveness of the Continuity-based Corrective Labels for Imitation Learning (CCIL) framework in mitigating this issue for real-world fine manipulation tasks. CCIL generates corrective labels by l… ▽ More

    Submitted 3 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  6. arXiv:2405.19101  [pdf, other

    cs.LG

    Poseidon: Efficient Foundation Models for PDEs

    Authors: Maximilian Herde, Bogdan Raonić, Tobias Rohner, Roger Käppeli, Roberto Molinaro, Emmanuel de Bézenac, Siddhartha Mishra

    Abstract: We introduce Poseidon, a foundation model for learning the solution operators of PDEs. It is based on a multiscale operator transformer, with time-conditioned layer norms that enable continuous-in-time evaluations. A novel training strategy leveraging the semi-group property of time-dependent PDEs to allow for significant scaling-up of the training data is also proposed. Poseidon is pretrained on… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  7. arXiv:2405.17606  [pdf, other

    cs.RO

    A Patient-Specific Framework for Autonomous Spinal Fixation via a Steerable Drilling Robot

    Authors: Susheela Sharma, Sarah Go, Zeynep Yakay, Yash Kulkarni, Siddhartha Kapuria, Jordan P. Amadio, Reza Rajebi, Mohsen Khadem, Nassir Navab, Farshid Alambeigi

    Abstract: In this paper, with the goal of enhancing the minimally invasive spinal fixation procedure in osteoporotic patients, we propose a first-of-its-kind image-guided robotic framework for performing an autonomous and patient-specific procedure using a unique concentric tube steerable drilling robot (CT-SDR). Particularly, leveraging a CT-SDR, we introduce the concept of J-shape drilling based on a pre-… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 10 pages, 3 figures

  8. arXiv:2405.16128  [pdf, other

    cs.AI cs.CL

    How Well Do Deep Learning Models Capture Human Concepts? The Case of the Typicality Effect

    Authors: Siddhartha K. Vemuri, Raj Sanjay Shah, Sashank Varma

    Abstract: How well do representations learned by ML models align with those of humans? Here, we consider concept representations learned by deep learning models and evaluate whether they show a fundamental behavioral signature of human concepts, the typicality effect. This is the finding that people judge some instances (e.g., robin) of a category (e.g., Bird) to be more typical than others (e.g., penguin).… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: To appear at CogSci 2024

  9. arXiv:2405.16034  [pdf, other

    cs.CV

    DiffuBox: Refining 3D Object Detection with Point Diffusion

    Authors: Xiangyu Chen, Zhenzhen Liu, Katie Z Luo, Siddhartha Datta, Adhitya Polavaram, Yan Wang, Yurong You, Boyi Li, Marco Pavone, Wei-Lun Chao, Mark Campbell, Bharath Hariharan, Kilian Q. Weinberger

    Abstract: Ensuring robust 3D object detection and localization is crucial for many applications in robotics and autonomous driving. Recent models, however, face difficulties in maintaining high performance when applied to domains with differing sensor setups or geographic locations, often resulting in poor localization accuracy due to domain shift. To overcome this challenge, we introduce a novel diffusion-… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  10. arXiv:2405.15218  [pdf, other

    cs.LG

    AGS-GNN: Attribute-guided Sampling for Graph Neural Networks

    Authors: Siddhartha Shankar Das, S M Ferdous, Mahantesh M Halappanavar, Edoardo Serra, Alex Pothen

    Abstract: We propose AGS-GNN, a novel attribute-guided sampling algorithm for Graph Neural Networks (GNNs) that exploits node features and connectivity structure of a graph while simultaneously adapting for both homophily and heterophily in graphs. (In homophilic graphs vertices of the same class are more likely to be connected, and vertices of different classes tend to be linked in heterophilic graphs.) Wh… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: The paper has been accepted to KDD'24 in the research track

  11. arXiv:2405.14558  [pdf, other

    cs.LG

    FUSE: Fast Unified Simulation and Estimation for PDEs

    Authors: Levi E. Lingsch, Dana Grund, Siddhartha Mishra, Georgios Kissas

    Abstract: The joint prediction of continuous fields and statistical estimation of the underlying discrete parameters is a common problem for many physical systems, governed by PDEs. Hitherto, it has been separately addressed by employing operator learning surrogates for field prediction while using simulation-based inference (and its variants) for statistical parameter determination. Here, we argue that sol… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  12. arXiv:2405.05066  [pdf, other

    cs.AI cs.CY cs.LG

    Designing Skill-Compatible AI: Methodologies and Frameworks in Chess

    Authors: Karim Hamade, Reid McIlroy-Young, Siddhartha Sen, Jon Kleinberg, Ashton Anderson

    Abstract: Powerful artificial intelligence systems are often used in settings where they must interact with agents that are computationally much weaker, for example when they work alongside humans or operate in complex environments where some tasks are handled by algorithms, heuristics, or other entities of varying computational power. For AI agents to successfully interact in these settings, however, achie… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 18 pages, 5 figures, 15 tables, Published In The Twelfth International Conference on Learning Representations, ICLR 2024

  13. ContextQ: Generated Questions to Support Meaningful Parent-Child Dialogue While Co-Reading

    Authors: Griffin Dietz Smith, Siddhartha Prasad, Matt J. Davidson, Leah Findlater, R. Benjamin Shapiro

    Abstract: Much of early literacy education happens at home with caretakers reading books to young children. Prior research demonstrates how having dialogue with children during co-reading can develop critical reading readiness skills, but most adult readers are unsure if and how to lead effective conversations. We present ContextQ, a tablet-based reading application to unobtrusively present auto-generated d… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: ACM Interaction Design and Children (IDC) 2024

  14. arXiv:2403.20327  [pdf, other

    cs.CL cs.AI

    Gecko: Versatile Text Embeddings Distilled from Large Language Models

    Authors: **hyuk Lee, Zhuyun Dai, Xiaoqi Ren, Blair Chen, Daniel Cer, Jeremy R. Cole, Kai Hui, Michael Boratko, Rajvi Kapadia, Wen Ding, Yi Luan, Sai Meher Karthik Duddu, Gustavo Hernandez Abrego, Weiqiang Shi, Nithi Gupta, Aditya Kusupati, Prateek Jain, Siddhartha Reddy Jonnalagadda, Ming-Wei Chang, Iftekhar Naim

    Abstract: We present Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging a key idea: distilling knowledge from large language models (LLMs) into a retriever. Our two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, we further refine the data quality by retrieving a set of candidate passages for each… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 18 pages

  15. arXiv:2403.15476  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Learning to Infer Generative Template Programs for Visual Concepts

    Authors: R. Kenny Jones, Siddhartha Chaudhuri, Daniel Ritchie

    Abstract: People grasp flexible visual concepts from a few examples. We explore a neurosymbolic system that learns how to infer programs that capture visual concepts in a domain-general fashion. We introduce Template Programs: programmatic expressions from a domain-specific language that specify structural and parametric patterns common to an input concept. Our framework supports multiple concept-related ta… ▽ More

    Submitted 9 June, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: ICML 2024; Project page: https://rkjones4.github.io/template.html

  16. arXiv:2403.11298  [pdf, other

    cs.RO

    Multi-Sample Long Range Path Planning under Sensing Uncertainty for Off-Road Autonomous Driving

    Authors: Matt Schmittle, Rohan Baijal, Brian Hou, Siddhartha Srinivasa, Byron Boots

    Abstract: We focus on the problem of long-range dynamic replanning for off-road autonomous vehicles, where a robot plans paths through a previously unobserved environment while continuously receiving noisy local observations. An effective approach for planning under sensing uncertainty is determinization, where one converts a stochastic world into a deterministic one and plans under this simplification. Thi… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  17. arXiv:2403.09782  [pdf, other

    cs.NI eess.SP

    Redundancy Transmission in UAV-Aided LoRa Networks Featuring Wake-Up Radios

    Authors: Siddhartha S. Borkotoky

    Abstract: We consider a LoRa sensor network featuring a UAV-mounted gateway for collecting sensor data (messages). Wake-up radios (WuR) are employed to inform the sensors of the UAV's arrival. Building on an existing random access scheme for such setups, we propose and evaluate two redundancy transmission protocols for enhancing the reliability of the data transfer. One protocol employs fountain-coded trans… ▽ More

    Submitted 26 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  18. arXiv:2403.07384  [pdf, other

    cs.CL cs.AI cs.LG

    SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models

    Authors: Yu Yang, Siddhartha Mishra, Jeffrey N Chiang, Baharan Mirzasoleiman

    Abstract: Despite the effectiveness of data selection for large language models (LLMs) during pretraining and instruction fine-tuning phases, improving data efficiency in supervised fine-tuning (SFT) for specialized domains poses significant challenges due to the complexity of fine-tuning data. To bridge this gap, we introduce an effective and scalable data selection method for SFT, SmallToLarge (S2L), whic… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  19. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  20. arXiv:2403.04134  [pdf, other

    cs.RO

    An Adaptable, Safe, and Portable Robot-Assisted Feeding System

    Authors: Ethan Kroll Gordon, Rajat Kumar Jenamani, Amal Nanavati, Ziang Liu, Haya Bolotski, Raida Karim, Daniel Stabile, Atharva Kashyap, Bernie Hao Zhu, Xilai Dai, Tyler Schrenk, Jonathan Ko, Taylor Kessler Faulkner, Tapomayukh Bhattacharjee, Siddhartha Srinivasa

    Abstract: We demonstrate a robot-assisted feeding system that enables people with mobility impairments to feed themselves. Our system design embodies Safety, Portability, and User Control, with comprehensive full-stack safety checks, the ability to be mounted on and powered by any powered wheelchair, and a custom web-app allowing care-recipients to leverage their own assistive devices for robot control. For… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: HRI 2024 Demo; Corrected inaccurate author ordering in ACM DL which occurred due to formatting issues

  21. arXiv:2403.00489  [pdf, other

    cs.HC cs.RO

    Multiple Ways of Working with Users to Develop Physically Assistive Robots

    Authors: Amal Nanavati, Max Pascher, Vinitha Ranganeni, Ethan K. Gordon, Taylor Kessler Faulkner, Siddhartha S. Srinivasa, Maya Cakmak, Patrícia Alves-Oliveira, Jens Gerken

    Abstract: Despite the growth of physically assistive robotics (PAR) research over the last decade, nearly half of PAR user studies do not involve participants with the target disabilities. There are several reasons for this -- recruitment challenges, small sample sizes, and transportation logistics -- all influenced by systemic barriers that people with disabilities face. However, it is well-established tha… ▽ More

    Submitted 7 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: A3DE '24: Workshop on Assistive Applications, Accessibility, and Disability Ethics at the ACM/IEEE International Conference on Human-Robot Interaction

  22. arXiv:2402.17720  [pdf, other

    cs.LG cs.DS cs.IT

    The SMART approach to instance-optimal online learning

    Authors: Siddhartha Banerjee, Alankrita Bhatt, Christina Lee Yu

    Abstract: We devise an online learning algorithm -- titled Switching via Monotone Adapted Regret Traces (SMART) -- that adapts to the data and achieves regret that is instance optimal, i.e., simultaneously competitive on every input sequence compared to the performance of the follow-the-leader (FTL) policy and the worst case guarantee of any other input policy. We show that the regret of the SMART policy on… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  23. arXiv:2402.16994  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    GEM3D: GEnerative Medial Abstractions for 3D Shape Synthesis

    Authors: Dmitry Petrov, Pradyumn Goyal, Vikas Thamizharasan, Vladimir G. Kim, Matheus Gadelha, Melinos Averkiou, Siddhartha Chaudhuri, Evangelos Kalogerakis

    Abstract: We introduce GEM3D -- a new deep, topology-aware generative model of 3D shapes. The key ingredient of our method is a neural skeleton-based representation encoding information on both shape topology and geometry. Through a denoising diffusion probabilistic model, our method first generates skeleton-based representations following the Medial Axis Transform (MAT), then generates surfaces through a s… ▽ More

    Submitted 10 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Webpage: https://lodurality.github.io/GEM3D/ -- Cond. accept. to SIGGRAPH 2024 (conf. track) -- Changes (based on reviews): changed style to sigconf; rearranged figures for readability; added missing citations; fixed misaligned centers in Fig. 3; added failure cases (Fig. 10); rewrote discussion; added categories averages to Tab. 8; added Tab. 10 with model capacities

  24. arXiv:2402.10926  [pdf, other

    math.NA cs.LG

    Numerical analysis of physics-informed neural networks and related models in physics-informed machine learning

    Authors: Tim De Ryck, Siddhartha Mishra

    Abstract: Physics-informed neural networks (PINNs) and their variants have been very popular in recent years as algorithms for the numerical simulation of both forward and inverse problems for partial differential equations. This article aims to provide a comprehensive review of currently available results on the numerical analysis of PINNs and related models that constitute the backbone of physics-informed… ▽ More

    Submitted 30 January, 2024; originally announced February 2024.

    MSC Class: 65M15

  25. arXiv:2402.03545  [pdf, other

    cs.LG

    Online Feature Updates Improve Online (Generalized) Label Shift Adaptation

    Authors: Ruihan Wu, Siddhartha Datta, Yi Su, Dheeraj Baby, Yu-Xiang Wang, Kilian Q. Weinberger

    Abstract: This paper addresses the prevalent issue of label shift in an online setting with missing labels, where data distributions change over time and obtaining timely labels is challenging. While existing methods primarily focus on adjusting or updating the final layer of a pre-trained classifier, we explore the untapped potential of enhancing feature representations using unlabeled data at test-time. O… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  26. arXiv:2402.03175  [pdf, other

    cs.LG cs.AI

    The Matrix: A Bayesian learning model for LLMs

    Authors: Siddhartha Dalal, Vishal Misra

    Abstract: In this paper, we introduce a Bayesian learning model to understand the behavior of Large Language Models (LLMs). We explore the optimization metric of LLMs, which is based on predicting the next token, and develop a novel model grounded in this principle. Our approach involves constructing an ideal generative text model represented by a multinomial transition probability matrix with a prior, and… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 12 pages, 6 figures

    ACM Class: I.2.7

  27. arXiv:2402.00097  [pdf, other

    cs.SE cs.LG

    Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM

    Authors: Gabriel Ryan, Siddhartha Jain, Mingyue Shang, Shiqi Wang, Xiaofei Ma, Murali Krishna Ramanathan, Baishakhi Ray

    Abstract: Testing plays a pivotal role in ensuring software quality, yet conventional Search Based Software Testing (SBST) methods often struggle with complex software units, achieving suboptimal test coverage. Recent works using large language models (LLMs) for test generation have focused on improving generation quality through optimizing the test generation context and correcting errors in model outputs,… ▽ More

    Submitted 2 April, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

  28. arXiv:2401.14098  [pdf, other

    cs.CR

    Carry Your Fault: A Fault Propagation Attack on Side-Channel Protected LWE-based KEM

    Authors: Suparna Kundu, Siddhartha Chowdhury, Sayandeep Saha, Angshuman Karmakar, Debdeep Mukhopadhyay, Ingrid Verbauwhede

    Abstract: Post-quantum cryptographic (PQC) algorithms, especially those based on the learning with errors (LWE) problem, have been subjected to several physical attacks in the recent past. Although the attacks broadly belong to two classes - passive side-channel attacks and active fault attacks, the attack strategies vary significantly due to the inherent complexities of such algorithms. Exploring further a… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    ACM Class: E.3.3

  29. arXiv:2401.12604  [pdf, other

    cs.CC

    On Pigeonhole Principles and Ramsey in TFNP

    Authors: Siddhartha Jain, Jiawei Li, Robert Robere, Zhiyang Xun

    Abstract: The generalized pigeonhole principle says that if tN + 1 pigeons are put into N holes then there must be a hole containing at least t + 1 pigeons. Let t-PPP denote the class of all total NP-search problems reducible to finding such a t-collision of pigeons. We introduce a new hierarchy of classes defined by the problems t-PPP. In addition to being natural problems in TFNP, we show that classes in… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  30. Conceptual Mutation Testing for Student Programming Misconceptions

    Authors: Siddhartha Prasad, Ben Greenman, Tim Nelson, Shriram Krishnamurthi

    Abstract: Context: Students often misunderstand programming problem descriptions. This can lead them to solve the wrong problem, which creates frustration, obstructs learning, and imperils grades. Researchers have found that students can be made to better understand the problem by writing examples before they start programming. These examples are checked against correct and wrong implementations -- analogou… ▽ More

    Submitted 28 December, 2023; originally announced January 2024.

    Journal ref: The Art, Science, and Engineering of Programming, 2024, Vol. 8, Issue 2, Article 7

  31. arXiv:2312.17227  [pdf, other

    cs.LG cs.AI

    Gradient-based Planning with World Models

    Authors: Jyothir S V, Siddhartha Jalagam, Yann LeCun, Vlad Sobal

    Abstract: The enduring challenge in the field of artificial intelligence has been the control of systems to achieve desired behaviours. While for systems governed by straightforward dynamics equations, methods like Linear Quadratic Regulation (LQR) have historically proven highly effective, most real-world tasks, which require a general problem-solver, demand world models with dynamics that cannot be easily… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  32. arXiv:2312.16720  [pdf, other

    cs.CV

    Prompt Expansion for Adaptive Text-to-Image Generation

    Authors: Siddhartha Datta, Alexander Ku, Deepak Ramachandran, Peter Anderson

    Abstract: Text-to-image generation models are powerful but difficult to use. Users craft specific prompts to get better images, though the images can be repetitive. This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort. The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts that are optimized such t… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  33. arXiv:2312.15063  [pdf, other

    cs.LG cond-mat.dis-nn

    A universal approximation theorem for nonlinear resistive networks

    Authors: Benjamin Scellier, Siddhartha Mishra

    Abstract: Resistor networks have recently had a surge of interest as substrates for energy-efficient self-learning machines. This work studies the computational capabilities of these resistor networks. We show that electrical networks composed of voltage sources, linear resistors, diodes and voltage-controlled voltage sources (VCVS) can implement any continuous functions. To prove it, we assume that the cir… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  34. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  35. Conceptualizing Suicidal Behavior: Utilizing Explanations of Predicted Outcomes to Analyze Longitudinal Social Media Data

    Authors: Van Minh Nguyen, Nasheen Nur, William Stern, Thomas Mercer, Chiradeep Sen, Siddhartha Bhattacharyya, Victor Tumbiolo, Seng Jhing Goh

    Abstract: The COVID-19 pandemic has escalated mental health crises worldwide, with social isolation and economic instability contributing to a rise in suicidal behavior. Suicide can result from social factors such as shame, abuse, abandonment, and mental health conditions like depression, Post-Traumatic Stress Disorder (PTSD), Attention-Deficit/Hyperactivity Disorder (ADHD), anxiety disorders, and bipolar d… ▽ More

    Submitted 30 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: Presented at ICMLA 2023, Special Session: Machine Learning in Health, 8 pages, 6 figures, 7 tables

  36. arXiv:2312.03742  [pdf, other

    cs.CL cs.LG

    Clinical Risk Prediction Using Language Models: Benefits And Considerations

    Authors: Angeela Acharya, Sulabh Shrestha, Anyi Chen, Joseph Conte, Sanja Avramovic, Siddhartha Sikdar, Antonios Anastasopoulos, Sanmay Das

    Abstract: The utilization of Electronic Health Records (EHRs) for clinical risk prediction is on the rise. However, strict privacy regulations limit access to comprehensive health records, making it challenging to apply standard machine learning algorithms in practical real-world scenarios. Previous research has addressed this data limitation by incorporating medical ontologies and employing transfer learni… ▽ More

    Submitted 28 November, 2023; originally announced December 2023.

    Comments: 12 pages, 6 figures, 4 tables

  37. arXiv:2312.02722  [pdf, other

    cs.CG cs.DS

    Improved Algorithms for Minimum-Membership Geometric Set Cover

    Authors: Sathish Govindarajan, Siddhartha Sarkar

    Abstract: Bandyapadhyay et al. introduced the generalized minimum-membership geometric set cover (GMMGSC) problem [SoCG, 2023], which is defined as follows. We are given two sets $P$ and $P'$ of points in $\mathbb{R}^{2}$, $n=\max(|P|, |P'|)$, and a set $\mathcal{S}$ of $m$ axis-parallel unit squares. The goal is to find a subset $\mathcal{S}^{*}\subseteq \mathcal{S}$ that covers all the points in $P$ while… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: To appear in CALDAM 2024

  38. arXiv:2312.02312  [pdf, other

    cs.LG cs.AI cs.CV

    Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games

    Authors: Lukas Schäfer, Logan Jones, Anssi Kanervisto, Yuhan Cao, Tabish Rashid, Raluca Georgescu, Dave Bignell, Siddhartha Sen, Andrea Treviño Gavito, Sam Devlin

    Abstract: Video games have served as useful benchmarks for the decision making community, but going beyond Atari games towards training agents in modern games has been prohibitively expensive for the vast majority of the research community. Recent progress in the research, development and open release of large vision models has the potential to amortize some of these costs across the community. However, it… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Preprint

  39. arXiv:2311.11199  [pdf, other

    cs.RO

    HOUND: An Open-Source, Low-cost Research Platform for High-speed Off-road Underactuated Nonholonomic Driving

    Authors: Sidharth Talia, Matt Schmittle, Alexander Lambert, Alexander Spitzer, Christoforos Mavrogiannis, Siddhartha S. Srinivasa

    Abstract: Off-road vehicles are susceptible to rollovers in terrains with large elevation features, such as steep hills, ditches, and berms. One way to protect them against rollovers is ruggedization through the use of industrial-grade parts and physical modifications. However, this solution can be prohibitively expensive for academic research labs. Our key insight is that a software-based rollover-preventi… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

    Comments: 6 Pages, 8 Figures

  40. arXiv:2311.07911  [pdf, other

    cs.CL cs.AI cs.LG

    Instruction-Following Evaluation for Large Language Models

    Authors: Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, Le Hou

    Abstract: One core capability of Large Language Models (LLMs) is to follow natural language instructions. However, the evaluation of such abilities is not standardized: Human evaluations are expensive, slow, and not objectively reproducible, while LLM-based auto-evaluation is potentially biased or limited by the ability of the evaluator LLM. To overcome these issues, we introduce Instruction-Following Eval… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    MSC Class: 68T50 (Primary) 68T99 (Secondary) ACM Class: I.2.7

  41. arXiv:2310.12972  [pdf, other

    cs.RO

    CCIL: Continuity-based Data Augmentation for Corrective Imitation Learning

    Authors: Liyiming Ke, Yunchu Zhang, Abhay Deshpande, Siddhartha Srinivasa, Abhishek Gupta

    Abstract: We present a new technique to enhance the robustness of imitation learning methods by generating corrective data to account for compounding errors and disturbances. While existing methods rely on interactive expert labeling, additional offline datasets, or domain-specific invariances, our approach requires minimal additional assumptions beyond access to expert data. The key insight is to leverage… ▽ More

    Submitted 3 June, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

  42. arXiv:2310.12033  [pdf, other

    cs.LG stat.ML

    Conformal Drug Property Prediction with Density Estimation under Covariate Shift

    Authors: Siddhartha Laghuvarapu, Zhen Lin, Jimeng Sun

    Abstract: In drug discovery, it is vital to confirm the predictions of pharmaceutical properties from computational models using costly wet-lab experiments. Hence, obtaining reliable uncertainty estimates is crucial for prioritizing drug molecules for subsequent experimental validation. Conformal Prediction (CP) is a promising tool for creating such prediction sets for molecular properties with a coverage g… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: Accepted at NeurIPS 2023

  43. arXiv:2310.08881  [pdf, ps, other

    cs.GT

    Online Resource Sharing via Dynamic Max-Min Fairness: Efficiency, Robustness and Non-Stationarity

    Authors: Giannis Fikioris, Siddhartha Banerjee, Éva Tardos

    Abstract: We study the allocation of shared resources over multiple rounds among competing agents, via a dynamic max-min fair (DMMF) mechanism: the good in each round is allocated to the requesting agent with the least number of allocations received to date. Previous work has shown that when an agent has i.i.d. values across rounds, then in the worst case, she can never get more than a constant strictly les… ▽ More

    Submitted 13 February, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

  44. arXiv:2310.08004  [pdf, other

    cs.CC quant-ph

    On the Rational Degree of Boolean Functions and Applications

    Authors: Vishnu Iyer, Siddhartha Jain, Matt Kovacs-Deak, Vinayak M. Kumar, Luke Schaeffer, Daochen Wang, Michael Whitmeyer

    Abstract: We study a natural complexity measure of Boolean functions known as the (exact) rational degree. For total functions $f$, it is conjectured that $\mathrm{rdeg}(f)$ is polynomially related to $\mathrm{deg}(f)$, where $\mathrm{deg}(f)$ is the Fourier degree. Towards this conjecture, we show that symmetric functions have rational degree at least $\mathrm{deg}(f)/2$ and monotone functions have rationa… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: 17 pages, 3 figures

  45. arXiv:2310.07814  [pdf, other

    cs.GR cs.CV cs.LG

    Explorable Mesh Deformation Subspaces from Unstructured Generative Models

    Authors: Arman Maesumi, Paul Guerrero, Vladimir G. Kim, Matthew Fisher, Siddhartha Chaudhuri, Noam Aigerman, Daniel Ritchie

    Abstract: Exploring variations of 3D shapes is a time-consuming process in traditional 3D modeling tools. Deep generative models of 3D shapes often feature continuous latent spaces that can, in principle, be used to explore potential variations starting from a set of input shapes. In practice, doing so can be problematic: latent spaces are high dimensional and hard to visualize, contain shapes that are not… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: SIGGRAPH Asia 2023, 15 pages

  46. arXiv:2310.07018  [pdf, other

    cs.CL cs.AI cs.RO

    NEWTON: Are Large Language Models Capable of Physical Reasoning?

    Authors: Yi Ru Wang, Jiafei Duan, Dieter Fox, Siddhartha Srinivasa

    Abstract: Large Language Models (LLMs), through their contextualized representations, have been empirically proven to encapsulate syntactic, semantic, word sense, and common-sense knowledge. However, there has been limited exploration of their physical reasoning abilities, specifically concerning the crucial attributes for comprehending everyday objects. To address this gap, we introduce NEWTON, a repositor… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Findings; 8 pages, 3 figures, 7 tables; Project page: https://newtonreasoning.github.io

  47. arXiv:2310.05801  [pdf, other

    cs.LG

    An operator preconditioning perspective on training in physics-informed machine learning

    Authors: Tim De Ryck, Florent Bonnet, Siddhartha Mishra, Emmanuel de Bézenac

    Abstract: In this paper, we investigate the behavior of gradient descent algorithms in physics-informed machine learning methods like PINNs, which minimize residuals connected to partial differential equations (PDEs). Our key result is that the difficulty in training these models is closely related to the conditioning of a specific differential operator. This operator, in turn, is associated to the Hermitia… ▽ More

    Submitted 3 May, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

  48. arXiv:2310.05365  [pdf, other

    cs.LG cs.AI

    Molecular De Novo Design through Transformer-based Reinforcement Learning

    Authors: Pengcheng Xu, Tao Feng, Tianfan Fu, Siddhartha Laghuvarapu, Jimeng Sun

    Abstract: In this work, we introduce a method to fine-tune a Transformer-based generative model for molecular de novo design. Leveraging the superior sequence learning capacity of Transformers over Recurrent Neural Networks (RNNs), our model can generate molecular structures with desired properties effectively. In contrast to the traditional RNN-based models, our proposed method exhibits superior performanc… ▽ More

    Submitted 8 March, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

  49. arXiv:2310.00230  [pdf, other

    cs.CL cs.SD eess.AS

    SLM: Bridge the thin gap between speech and text foundation models

    Authors: Mingqiu Wang, Wei Han, Izhak Shafran, Zelin Wu, Chung-Cheng Chiu, Yuan Cao, Yongqiang Wang, Nanxin Chen, Yu Zhang, Hagen Soltau, Paul Rubenstein, Lukas Zilka, Dian Yu, Zhong Meng, Golan Pundak, Nikhil Siddhartha, Johan Schalkwyk, Yonghui Wu

    Abstract: We present a joint Speech and Language Model (SLM), a multitask, multilingual, and dual-modal model that takes advantage of pretrained foundational speech and language models. SLM freezes the pretrained foundation models to maximally preserves their capabilities, and only trains a simple adapter with just 1\% (156M) of the foundation models' parameters. This adaptation not only leads SLM to achiev… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

  50. arXiv:2309.15642  [pdf, other

    quant-ph cond-mat.str-el cs.CE cs.LG

    Efficient tensor network simulation of IBM's largest quantum processors

    Authors: Siddhartha Patra, Saeed S. Jahromi, Sukhbinder Singh, Roman Orus

    Abstract: We show how quantum-inspired 2d tensor networks can be used to efficiently and accurately simulate the largest quantum processors from IBM, namely Eagle (127 qubits), Osprey (433 qubits) and Condor (1121 qubits). We simulate the dynamics of a complex quantum many-body system -- specifically, the kicked Ising experiment considered recently by IBM in Nature 618, p. 500-505 (2023) -- using graph-base… ▽ More

    Submitted 2 April, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: 7 pages, 8 figures, revised version

    Journal ref: Phys. Rev. Research 6, 013326 (2024)