Skip to main content

Showing 1–50 of 421 results for author: Vikram

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04620  [pdf, other

    cs.LG cs.AI cs.CL

    Learning to (Learn at Test Time): RNNs with Expressive Hidden States

    Authors: Yu Sun, Xinhao Li, Karan Dalal, Jiarui Xu, Arjun Vikram, Genghan Zhang, Yann Dubois, Xinlei Chen, Xiaolong Wang, Sanmi Koyejo, Tatsunori Hashimoto, Carlos Guestrin

    Abstract: Self-attention performs well in long context but has quadratic complexity. Existing RNN layers have linear complexity, but their performance in long context is limited by the expressive power of their hidden state. We propose a new class of sequence modeling layers with linear complexity and an expressive hidden state. The key idea is to make the hidden state a machine learning model itself, and t… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  2. arXiv:2407.00201  [pdf, other

    q-bio.NC cs.LG eess.IV

    Deconvolving Complex Neuronal Networks into Interpretable Task-Specific Connectomes

    Authors: Yifan Wang, Vikram Ravindra, Ananth Grama

    Abstract: Task-specific functional MRI (fMRI) images provide excellent modalities for studying the neuronal basis of cognitive processes. We use fMRI data to formulate and solve the problem of deconvolving task-specific aggregate neuronal networks into a set of basic building blocks called canonical networks, to use these networks for functional characterization, and to characterize the physiological basis… ▽ More

    Submitted 3 July, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

    Comments: 9 pages, 5 figures

  3. arXiv:2406.20077  [pdf, other

    cs.CV

    HouseCrafter: Lifting Floorplans to 3D Scenes with 2D Diffusion Model

    Authors: Hieu T. Nguyen, Yiwen Chen, Vikram Voleti, Varun Jampani, Huaizu Jiang

    Abstract: We introduce HouseCrafter, a novel approach that can lift a floorplan into a complete large 3D indoor scene (e.g., a house). Our key insight is to adapt a 2D diffusion model, which is trained on web-scale images, to generate consistent multi-view color (RGB) and depth (D) images across different locations of the scene. Specifically, the RGB-D images are generated autoregressively in a batch-wise m… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  4. arXiv:2406.11217  [pdf, other

    cs.AI cs.CL cs.CV physics.ao-ph

    WeatherQA: Can Multimodal Language Models Reason about Severe Weather?

    Authors: Chengqian Ma, Zhanxiang Hua, Alexandra Anderson-Frey, Vikram Iyer, Xin Liu, Lianhui Qin

    Abstract: Severe convective weather events, such as hail, tornadoes, and thunderstorms, often occur quickly yet cause significant damage, costing billions of dollars every year. This highlights the importance of forecasting severe weather threats hours in advance to better prepare meteorologists and residents in at-risk areas. Can modern large foundation models perform such forecasting? Existing weather ben… ▽ More

    Submitted 23 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 26 pages, 9 figures

  5. arXiv:2406.07521  [pdf, other

    cs.DS cs.LG

    Faster Spectral Density Estimation and Sparsification in the Nuclear Norm

    Authors: Yujia **, Ishani Karmarkar, Christopher Musco, Aaron Sidford, Apoorv Vikram Singh

    Abstract: We consider the problem of estimating the spectral density of the normalized adjacency matrix of an $n$-node undirected graph. We provide a randomized algorithm that, with $O(nε^{-2})$ queries to a degree and neighbor oracle and in $O(nε^{-3})$ time, estimates the spectrum up to $ε$ accuracy in the Wasserstein-1 metric. This improves on previous state-of-the-art methods, including an $O(nε^{-7})$… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted for presentation at the Conference on Learning Theory (COLT) 2024

  6. arXiv:2406.03953  [pdf, other

    cs.CL

    Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech

    Authors: Neemesh Yadav, Sarah Masud, Vikram Goyal, Vikram Goyal, Md Shad Akhtar, Tanmoy Chakraborty

    Abstract: Employing language models to generate explanations for an incoming implicit hate post is an active area of research. The explanation is intended to make explicit the underlying stereotype and aid content moderators. The training often combines top-k relevant knowledge graph (KG) tuples to provide world knowledge and improve performance on standard metrics. Interestingly, our study presents conflic… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 17 Pages, 5 Figures, 13 Tables, ACL Findings 2024

  7. arXiv:2406.02844  [pdf, other

    cs.IR cs.CL

    Item-Language Model for Conversational Recommendation

    Authors: Li Yang, Anushya Subbiah, Hardik Patel, Judith Yue Li, Yanwei Song, Reza Mirghaderi, Vikram Aggarwal

    Abstract: Large-language Models (LLMs) have been extremely successful at tasks like complex dialogue understanding, reasoning and coding due to their emergent abilities. These emergent abilities have been extended with multi-modality to include image, audio, and video capabilities. Recommender systems, on the other hand, have been critical for information seeking and item discovery needs. Recently, there ha… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 15 pages, 3 figures

  8. arXiv:2406.01606  [pdf, other

    cs.IR cs.AI cs.CL

    SymTax: Symbiotic Relationship and Taxonomy Fusion for Effective Citation Recommendation

    Authors: Karan Goyal, Mayank Goel, Vikram Goyal, Mukesh Mohania

    Abstract: Citing pertinent literature is pivotal to writing and reviewing a scientific document. Existing techniques mainly focus on the local context or the global context for recommending citations but fail to consider the actual human citation behaviour. We propose SymTax, a three-stage recommendation architecture that considers both the local and the global context, and additionally the taxonomical repr… ▽ More

    Submitted 26 May, 2024; originally announced June 2024.

    Comments: Accepted in ACL 2024

  9. arXiv:2405.18574  [pdf, other

    cs.SE

    SpecTra: Enhancing the Code Translation Ability of Language Models by Generating Multi-Modal Specifications

    Authors: Vikram Nitin, Baishakhi Ray

    Abstract: Large language models (LLMs) are increasingly being used for the task of automated code translation, which has important real-world applications. However, most existing approaches use only the source code of a program as an input to an LLM, and do not consider the different kinds of specifications that can be extracted from a program. In this paper, we propose SpecTra, a multi-stage approach that… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  10. arXiv:2405.17442  [pdf, other

    cs.NI cs.AI cs.LG cs.OS

    Leveraging Machine Learning for Accurate IoT Device Identification in Dynamic Wireless Contexts

    Authors: Bhagyashri Tushir, Vikram K Ramanna, Yuhong Liu, Behnam Dezfouli

    Abstract: Identifying IoT devices is crucial for network monitoring, security enforcement, and inventory tracking. However, most existing identification methods rely on deep packet inspection, which raises privacy concerns and adds computational complexity. More importantly, existing works overlook the impact of wireless channel dynamics on the accuracy of layer-2 features, thereby limiting their effectiven… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Report number: SIOTLAB-RP-M24-SEC

  11. arXiv:2405.16445  [pdf

    cs.RO

    Robotic Path Planning Implementation using Search Algorithms

    Authors: Vikram Shahapur, Blessing Dixon, Urvishkumar Bharti

    Abstract: Till now, many path planning algorithms have been proposed in the literature. The objective of these algorithms is to find the quickest path between initial position to the end position in a certain environment. The complexity of these algorithms depends on the internal parameters such as motor speed or sensor range and on other external parameters, including the accuracy of the map, size of the e… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  12. arXiv:2405.12742  [pdf, other

    cs.CV

    Multi-Subject Personalization

    Authors: Arushi Jain, Shubham Paliwal, Monika Sharma, Vikram Jamwal, Lovekesh Vig

    Abstract: Creative story illustration requires a consistent interplay of multiple characters or objects. However, conventional text-to-image models face significant challenges while producing images featuring multiple personalized subjects. For example, they distort the subject rendering, or the text descriptions fail to render coherent subject interactions. We present Multi-Subject Personalization (MSP) to… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 2023 Conference on Neural Information Processing Systems

  13. arXiv:2405.12531  [pdf, other

    cs.CV cs.LG

    CustomText: Customized Textual Image Generation using Diffusion Models

    Authors: Shubham Paliwal, Arushi Jain, Monika Sharma, Vikram Jamwal, Lovekesh Vig

    Abstract: Textual image generation spans diverse fields like advertising, education, product packaging, social media, information visualization, and branding. Despite recent strides in language-guided image synthesis using diffusion models, current models excel in image generation but struggle with accurate text rendering and offer limited control over font attributes. In this paper, we aim to enhance the s… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: Accepted by AI for Content Creation (AI4CC) workshop at CVPR 2024

  14. arXiv:2405.07417  [pdf, other

    cs.SI eess.SP

    Identifying Hate Speech Peddlers in Online Platforms. A Bayesian Social Learning Approach for Large Language Model Driven Decision-Makers

    Authors: Adit Jain, Vikram Krishnamurthy

    Abstract: This paper studies the problem of autonomous agents performing Bayesian social learning for sequential detection when the observations of the state belong to a high-dimensional space and are expensive to analyze. Specifically, when the observations are textual, the Bayesian agent can use a large language model (LLM) as a map to get a low-dimensional private observation. The agent performs Bayesian… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  15. arXiv:2405.07415  [pdf, ps, other

    cs.LG eess.SY

    Structured Reinforcement Learning for Incentivized Stochastic Covert Optimization

    Authors: Adit Jain, Vikram Krishnamurthy

    Abstract: This paper studies how a stochastic gradient algorithm (SG) can be controlled to hide the estimate of the local stationary point from an eavesdropper. Such problems are of significant interest in distributed optimization settings like federated learning and inventory management. A learner queries a stochastic oracle and incentivizes the oracle to obtain noisy gradient measurements and perform SG.… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  16. Investigating Interaction Modes and User Agency in Human-LLM Collaboration for Domain-Specific Data Analysis

    Authors: Jia**g Guo, Vikram Mohanty, Jorge Piazentin Ono, Hongtao Hao, Liang Gou, Liu Ren

    Abstract: Despite demonstrating robust capabilities in performing tasks related to general-domain data-operation tasks, Large Language Models (LLMs) may exhibit shortcomings when applied to domain-specific tasks. We consider the design of domain-specific AI-powered data analysis tools from two dimensions: interaction and user agency. We implemented two design probes that fall on the two ends of the two dime… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: CHI'24 Late-Breaking Work

    ACM Class: H.5.2

  17. arXiv:2405.05213  [pdf, other

    math.NA cs.CE

    Exponential time propagators for elastodynamics

    Authors: Paavai Pari, Bikash Kanungo, Vikram Gavini

    Abstract: We propose a computationally efficient and systematically convergent approach for elastodynamics simulations. We recast the second-order dynamical equation of elastodynamics into an equivalent first-order system of coupled equations, so as to express the solution in the form of a Magnus expansion. With any spatial discretization, it entails computing the exponential of a matrix acting upon a vecto… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  18. arXiv:2404.15391  [pdf, ps, other

    cs.GT econ.GN

    Adaptive Mechanism Design using Multi-Agent Revealed Preferences

    Authors: Luke Snow, Vikram Krishnamurthy

    Abstract: This paper constructs an algorithmic framework for adaptively achieving the mechanism design objective, finding a mechanism inducing socially optimal Nash equilibria, without knowledge of the utility functions of the agents. We consider a probing scheme where the designer can iteratively enact mechanisms and observe Nash equilibria responses. We first derive necessary and sufficient conditions, ta… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  19. arXiv:2404.12504  [pdf, other

    cs.HC cs.RO

    Using Capability Maps Tailored to Arm Range of Motion in VR Exergames for Rehabilitation

    Authors: Christian Lourido, Zaid Waghoo, Hassam Khan Wazir, Nishtha Bhagat, Vikram Kapila

    Abstract: Many neurological conditions, e.g., a stroke, can cause patients to experience upper limb (UL) motor impairments that hinder their daily activities. For such patients, while rehabilitation therapy is key for regaining autonomy and restoring mobility, its long-term nature entails ongoing time commitment and it is often not sufficiently engaging. Virtual reality (VR) can transform rehabilitation the… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 4 pages, 4 figures. Paper accepted at IEEE International Conference on Engineering in Medicine & Biology Society, 2024

  20. arXiv:2404.10310  [pdf, other

    eess.AS cs.LG

    Wireless Earphone-based Real-Time Monitoring of Breathing Exercises: A Deep Learning Approach

    Authors: Hassam Khan Wazir, Zaid Waghoo, Vikram Kapila

    Abstract: Several therapy routines require deep breathing exercises as a key component and patients undergoing such therapies must perform these exercises regularly. Assessing the outcome of a therapy and tailoring its course necessitates monitoring a patient's compliance with the therapy. While therapy compliance monitoring is routine in a clinical environment, it is challenging to do in an at-home setting… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 4 pages, 2 figures. Paper accepted at IEEE International Conference on Engineering in Medicine & Biology Society, 2024

  21. arXiv:2404.07839  [pdf, other

    cs.LG cs.AI cs.CL

    RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

    Authors: Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti , et al. (37 additional authors not shown)

    Abstract: We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned var… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  22. arXiv:2404.03130  [pdf, other

    cs.HC

    Biodegradable Interactive Materials

    Authors: Zhihan Zhang, Mallory Parker, Kuotian Liao, Jerry Cao, Anandghan Waghmare, Joseph Breda, Chris Matsumura, Serena Eley, Eleftheria Roumeli, Shwetak Patel, Vikram Iyer

    Abstract: The sense of touch is fundamental to how we interact with the physical and digital world. Conventional interactive surfaces and tactile interfaces use electronic sensors embedded into objects, however this approach poses serious challenges both for environmental sustainability and a future of truly ubiquitous interaction systems where information is encoded into everyday objects. In this work, we… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  23. arXiv:2404.00538  [pdf, ps, other

    cs.CR stat.AP

    Eclipse Attack Detection on a Blockchain Network as a Non-Parametric Change Detection Problem

    Authors: Anurag Gupta, Vikram Krishnamurthy, Brian M. Sadler

    Abstract: This paper introduces a novel non-parametric change detection algorithm to identify eclipse attacks on a blockchain network; the non-parametric algorithm relies only on the empirical mean and variance of the dataset, making it highly adaptable. An eclipse attack occurs when malicious actors isolate blockchain users, disrupting their ability to reach consensus with the broader network, thereby dist… ▽ More

    Submitted 30 May, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

  24. arXiv:2403.12008  [pdf, other

    cs.CV

    SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

    Authors: Vikram Voleti, Chun-Han Yao, Mark Boss, Adam Letts, David Pankratz, Dmitry Tochilkin, Christian Laforte, Robin Rombach, Varun Jampani

    Abstract: We present Stable Video 3D (SV3D) -- a latent video diffusion model for high-resolution, image-to-multi-view generation of orbital videos around a 3D object. Recent work on 3D generation propose techniques to adapt 2D generative models for novel view synthesis (NVS) and 3D optimization. However, these methods have several disadvantages due to either limited views or inconsistent NVS, thereby affec… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Project page: https://sv3d.github.io/

  25. arXiv:2403.09810  [pdf, other

    cs.HC cs.AI cs.LG

    LabelAId: Just-in-time AI Interventions for Improving Human Labeling Quality and Domain Knowledge in Crowdsourcing Systems

    Authors: Chu Li, Zhihan Zhang, Michael Saugstad, Esteban Safranchik, Minchu Kulkarni, Xiaoyu Huang, Shwetak Patel, Vikram Iyer, Tim Althoff, Jon E. Froehlich

    Abstract: Crowdsourcing platforms have transformed distributed problem-solving, yet quality control remains a persistent challenge. Traditional quality control measures, such as prescreening workers and refining instructions, often focus solely on optimizing economic output. This paper explores just-in-time AI interventions to enhance both labeling quality and domain-specific knowledge among crowdworkers. W… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  26. arXiv:2403.06313  [pdf

    cs.LG cs.AI

    Optimal Policy Sparsification and Low Rank Decomposition for Deep Reinforcement Learning

    Authors: Vikram Goddla

    Abstract: Deep reinforcement learning(DRL) has shown significant promise in a wide range of applications including computer games and robotics. Yet, training DRL policies consume extraordinary computing resources resulting in dense policies which are prone to overfitting. Moreover, inference with dense DRL policies limit their practical applications, especially in edge computing. Techniques such as pruning… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  27. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  28. arXiv:2403.02286  [pdf, other

    cs.DB

    Stage: Query Execution Time Prediction in Amazon Redshift

    Authors: Ziniu Wu, Ryan Marcus, Zhengchun Liu, Parimarjan Negi, Vikram Nathan, Pascal Pfeil, Gaurav Saxena, Mohammad Rahman, Balakrishnan Narayanaswamy, Tim Kraska

    Abstract: Query performance (e.g., execution time) prediction is a critical component of modern DBMSes. As a pioneering cloud data warehouse, Amazon Redshift relies on an accurate execution time prediction for many downstream tasks, ranging from high-level optimizations, such as automatically creating materialized views, to low-level tasks on the critical path of query execution, such as admission, scheduli… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 15 pages

  29. arXiv:2402.09288  [pdf, other

    cs.LG

    EcoVal: An Efficient Data Valuation Framework for Machine Learning

    Authors: Ayush K Tarun, Vikram S Chundawat, Murari Mandal, Hong Ming Tan, Bowei Chen, Mohan Kankanhalli

    Abstract: Quantifying the value of data within a machine learning workflow can play a pivotal role in making more strategic decisions in machine learning initiatives. The existing Shapley value based frameworks for data valuation in machine learning are computationally expensive as they require considerable amount of repeated training of the model to obtain the Shapley value. In this paper, we introduce an… ▽ More

    Submitted 7 April, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  30. arXiv:2402.04447  [pdf, other

    cs.NI eess.SP

    Context-Aware Spectrum Coexistence of Terrestrial Beyond 5G Networks in Satellite Bands

    Authors: Ta Seen Reaz Niloy, Zoheb Hasan, Rob Smith, Vikram R. Anapana, Vijay K. Shah

    Abstract: Spectrum sharing between terrestrial 5G and incumbent networks in the satellite bands presents a promising avenue to satisfy the ever-increasing bandwidth demand of the next-generation wireless networks. However, protecting incumbent operations from harmful interference poses a fundamental challenge in accommodating terrestrial broadband cellular networks in the satellite bands. State-of-the-art s… ▽ More

    Submitted 14 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  31. arXiv:2402.02144  [pdf, other

    cs.CL

    Probing Critical Learning Dynamics of PLMs for Hate Speech Detection

    Authors: Sarah Masud, Mohammad Aflah Khan, Vikram Goyal, Md Shad Akhtar, Tanmoy Chakraborty

    Abstract: Despite the widespread adoption, there is a lack of research into how various critical aspects of pretrained language models (PLMs) affect their performance in hate speech detection. Through five research questions, our findings and recommendations lay the groundwork for empirically investigating different aspects of PLMs' use in hate speech detection. We deep dive into comparing different pretrai… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Comments: 20 pages, 9 figures, 14 tables. Accepted at EACL'24

  32. arXiv:2401.16914  [pdf, other

    cs.LG cond-mat.mtrl-sci

    Energy-conserving equivariant GNN for elasticity of lattice architected metamaterials

    Authors: Ivan Grega, Ilyes Batatia, Gábor Csányi, Sri Karlapati, Vikram S. Deshpande

    Abstract: Lattices are architected metamaterials whose properties strongly depend on their geometrical design. The analogy between lattices and graphs enables the use of graph neural networks (GNNs) as a faster surrogate model compared to traditional methods such as finite element modelling. In this work, we generate a big dataset of structure-property relationships for strut-based lattices. The dataset is… ▽ More

    Submitted 20 March, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: International Conference on Learning Representations 2024

  33. arXiv:2401.15906  [pdf, other

    cs.CR cs.IT stat.AP

    Mean Estimation with User-Level Privacy for Spatio-Temporal IoT Datasets

    Authors: V. Arvind Rameshwar, Anshoo Tandon, Prajjwal Gupta, Aditya Vikram Singh, Novoneel Chakraborty, Abhay Sharma

    Abstract: This paper considers the problem of the private release of sample means of speed values from traffic datasets. Our key contribution is the development of user-level differentially private algorithms that incorporate carefully chosen parameter values to ensure low estimation errors on real-world datasets, while ensuring privacy. We test our algorithms on ITMS (Intelligent Traffic Management System)… ▽ More

    Submitted 25 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: 14 pages, 5 figures, submitted to the ACM for possible publication

  34. arXiv:2401.14581  [pdf, other

    cs.CY cs.HC

    AVELA -- A Vision for Engineering Literacy & Access: Understanding Why Technology Alone Is Not Enough

    Authors: Kyle Johnson, Vicente Arroyos, Celeste Garcia, Liban Hussein, Aisha Cora, Tsewone Melaku, Jay L. Cunningham, R. Benjamin Shapiro, Vikram Iyer

    Abstract: Unequal technology access for Black and Latine communities has been a persistent economic, social justice, and human rights issue despite increased technology accessibility due to advancements in consumer electronics like phones, tablets, and computers. We contextualize socio-technical access inequalities for Black and Latine urban communities and find that many students are hesitant to engage wit… ▽ More

    Submitted 29 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: This is the author's version of the work. It is posted here for personal use, not for redistribution

  35. arXiv:2401.13649  [pdf, other

    cs.LG cs.CL cs.CV

    VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks

    Authors: **g Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Ruslan Salakhutdinov, Daniel Fried

    Abstract: Autonomous agents capable of planning, reasoning, and executing actions on the web offer a promising avenue for automating computer tasks. However, the majority of existing benchmarks primarily focus on text-based agents, neglecting many natural tasks that require visual information to effectively solve. Given that most computer interfaces cater to human perception, visual information often augmen… ▽ More

    Submitted 5 June, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted to ACL 2024. 24 pages. Project page: https://jykoh.com/vwa

  36. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  37. arXiv:2312.11509  [pdf, other

    cs.CL cs.LG eess.AS

    Toward a Reinforcement-Learning-Based System for Adjusting Medication to Minimize Speech Disfluency

    Authors: Pavlos Constas, Vikram Rawal, Matthew Honorio Oliveira, Andreas Constas, Aditya Khan, Kaison Cheung, Najma Sultani, Carrie Chen, Micol Altomare, Michael Akzam, Jiacheng Chen, Vhea He, Lauren Altomare, Heraa Murqi, Asad Khan, Nimit Amikumar Bhanshali, Youssef Rachad, Michael Guerzhoy

    Abstract: We propose a reinforcement learning (RL)-based system that would automatically prescribe a hypothetical patient medication that may help the patient with their mental health-related speech disfluency, and adjust the medication and the dosages in response to zero-cost frequent measurement of the fluency of the patient. We demonstrate the components of the system: a module that detects and evaluates… ▽ More

    Submitted 5 February, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: In Proc. Machine Learning for Cognitive and Mental Health Workshop (ML4CMH) at AAAI 2024

  38. arXiv:2312.10369  [pdf, other

    cs.GT cs.AI

    Proportional Representation in Metric Spaces and Low-Distortion Committee Selection

    Authors: Yusuf Hakan Kalayci, David Kempe, Vikram Kher

    Abstract: We introduce a novel definition for a small set R of k points being "representative" of a larger set in a metric space. Given a set V (e.g., documents or voters) to represent, and a set C of possible representatives, our criterion requires that for any subset S comprising a theta fraction of V, the average distance of S to their best theta*k points in R should not be more than a factor gamma compa… ▽ More

    Submitted 23 January, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

    Comments: 24 pages, Accepted to AAAI 24

  39. arXiv:2312.06022  [pdf, other

    cs.CL

    Exploiting Representation Bias for Data Distillation in Abstractive Text Summarization

    Authors: Yash Kumar Atri, Vikram Goyal, Tanmoy Chakraborty

    Abstract: Abstractive text summarization is surging with the number of training samples to cater to the needs of the deep learning models. These models tend to exploit the training data representations to attain superior performance by improving the quantitative element of the resultant summary. However, increasing the size of the training set may not always be the ideal solution to maximize the performance… ▽ More

    Submitted 20 December, 2023; v1 submitted 10 December, 2023; originally announced December 2023.

  40. arXiv:2312.02179  [pdf, other

    cs.LG cs.AI cs.CL

    Training Chain-of-Thought via Latent-Variable Inference

    Authors: Du Phan, Matthew D. Hoffman, David Dohan, Sholto Douglas, Tuan Anh Le, Aaron Parisi, Pavel Sountsov, Charles Sutton, Sharad Vikram, Rif A. Saurous

    Abstract: Large language models (LLMs) solve problems more accurately and interpretably when instructed to work out the answer step by step using a ``chain-of-thought'' (CoT) prompt. One can also improve LLMs' performance on a specific task by supervised fine-tuning, i.e., by using gradient ascent on some tunable parameters to maximize the average log-likelihood of correct answers from a labeled training se… ▽ More

    Submitted 28 November, 2023; originally announced December 2023.

    Comments: 23 pages, 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  41. arXiv:2311.15127  [pdf, other

    cs.CV

    Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

    Authors: Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, Varun Jampani, Robin Rombach

    Abstract: We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained for 2D image synthesis have been turned into generative video models by inserting temporal layers and finetuning them on small, high-quality video datasets. However, training methods in the literature vary wi… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

  42. From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models

    Authors: Zachary Englhardt, Chengqian Ma, Margaret E. Morris, Xuhai "Orson" Xu, Chun-Cheng Chang, Lianhui Qin, Daniel McDuff, Xin Liu, Shwetak Patel, Vikram Iyer

    Abstract: Passively collected behavioral health data from ubiquitous sensors holds significant promise to provide mental health professionals insights from patient's daily lives; however, develo** analysis tools to use this data in clinical practice requires addressing challenges of generalization across devices and weak or ambiguous correlations between the measured signals and an individual's mental hea… ▽ More

    Submitted 25 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

  43. arXiv:2311.10652  [pdf, other

    cs.HC

    What Lies Beneath? Exploring the Impact of Underlying AI Model Updates in AI-Infused Systems

    Authors: Vikram Mohanty, Jude Lim, Kurt Luther

    Abstract: As AI models evolve, understanding the influence of underlying models on user experience and performance in AI-infused systems becomes critical, particularly while transitioning between different model versions. We studied the influence of model change by conducting two complementary studies in the context of AI-based facial recognition for historical person identification tasks. First, we ran an… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

  44. arXiv:2311.09611  [pdf, other

    cs.HC

    DeltaLCA: Comparative Life-Cycle Assessment for Electronics Design

    Authors: Zhihan Zhang, Felix Hähnlein, Yuxuan Mei, Zachary Englhardt, Shwetak Patel, Adriana Schulz, Vikram Iyer

    Abstract: Reducing the environmental footprint of electronics and computing devices requires new tools that empower designers to make informed decisions about sustainability during the design process itself. This is not possible with current tools for life cycle assessment (LCA) which require substantial domain expertise and time to evaluate the numerous chips and other components that make up a device. We… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  45. arXiv:2311.06323  [pdf, ps, other

    cs.IR cs.AI cs.LG

    Reviewing Developments of Graph Convolutional Network Techniques for Recommendation Systems

    Authors: Haojun Zhu, Vikram Kapoor, Priya Sharma

    Abstract: The Recommender system is a vital information service on today's Internet. Recently, graph neural networks have emerged as the leading approach for recommender systems. We try to review recent literature on graph neural network-based recommender systems, covering the background and development of both recommender systems and graph neural networks. Then categorizing recommender systems by their set… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: arXiv admin note: text overlap with arXiv:2103.08976 by other authors

  46. arXiv:2311.04588  [pdf, other

    cs.LG cs.AI cs.CR cs.CV

    Army of Thieves: Enhancing Black-Box Model Extraction via Ensemble based sample selection

    Authors: Akshit **dal, Vikram Goyal, Saket Anand, Chetan Arora

    Abstract: Machine Learning (ML) models become vulnerable to Model Stealing Attacks (MSA) when they are deployed as a service. In such attacks, the deployed model is queried repeatedly to build a labelled dataset. This dataset allows the attacker to train a thief model that mimics the original model. To maximize query efficiency, the attacker has to select the most informative subset of data points from the… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: 10 pages, 5 figures, paper accepted to WACV 2024

  47. arXiv:2311.00626  [pdf, other

    cs.RO

    nvblox: GPU-Accelerated Incremental Signed Distance Field Map**

    Authors: Alexander Millane, Helen Oleynikova, Emilie Wirbel, Remo Steiner, Vikram Ramasamy, David Tingdahl, Roland Siegwart

    Abstract: Dense, volumetric maps are essential to enable robot navigation and interaction with the environment. To achieve low latency, dense maps are typically computed onboard the robot, often on computationally constrained hardware. Previous works leave a gap between CPU-based systems for robotic map** which, due to computation constraints, limit map resolution or scale, and GPU-based reconstruction sy… ▽ More

    Submitted 15 March, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: Accepted to ICRA 2024

  48. arXiv:2310.16872  [pdf, other

    eess.IV cs.CV

    SonoSAMTrack -- Segment and Track Anything on Ultrasound Images

    Authors: Hariharan Ravishankar, Rohan Patil, Vikram Melapudi, Harsh Suthar, Stephan Anzengruber, Parminder Bhatia, Kass-Hout Taha, Pavan Annangi

    Abstract: In this paper, we present SonoSAMTrack - that combines a promptable foundational model for segmenting objects of interest on ultrasound images called SonoSAM, with a state-of-the art contour tracking model to propagate segmentations on 2D+t and 3D ultrasound datasets. Fine-tuned and tested exclusively on a rich, diverse set of objects from $\approx200$k ultrasound image-mask pairs, SonoSAM demonst… ▽ More

    Submitted 16 November, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

  49. arXiv:2310.13157  [pdf, other

    cs.CV cs.AI cs.LG

    Conditional Generative Modeling for Images, 3D Animations, and Video

    Authors: Vikram Voleti

    Abstract: This dissertation attempts to drive innovation in the field of generative modeling for computer vision, by exploring novel formulations of conditional generative models, and innovative applications in images, 3D animations, and video. Our research focuses on architectures that offer reversible transformations of noise and visual data, and the application of encoder-decoder architectures for genera… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Doctoral thesis, Mila, University of Montreal. 189 pages

  50. arXiv:2310.12174  [pdf, other

    physics.soc-ph cs.CE eess.SY

    A Traffic Control Framework for Uncrewed Aircraft Systems

    Authors: Ananay Vikram Gupta, Aaditya Prakash Kattekola, Ansh Vikram Gupta, Dacharla Venkata Abhiram, Kamesh Namuduri, Ravichandran Subramanian

    Abstract: The exponential growth of Advanced Air Mobility (AAM) services demands assurances of safety in the airspace. This research a Traffic Control Framework (TCF) for develo** digital flight rules for Uncrewed Aircraft System (UAS) flying in designated air corridors. The proposed TCF helps model, deploy, and test UAS control, agents, regardless of their hardware configurations. This paper investigates… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

    Comments: 6 pages, 7 figures