Skip to main content

Showing 1–50 of 79 results for author: Narasimhan, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12045  [pdf, other

    cs.AI cs.CL

    $τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

    Authors: Shunyu Yao, Noah Shinn, Pedram Razavi, Karthik Narasimhan

    Abstract: Existing benchmarks do not test language agents on their interaction with human users or ability to follow domain-specific rules, both of which are vital for deploying them in real world applications. We propose $τ$-bench, a benchmark emulating dynamic conversations between a user (simulated by language models) and a language agent provided with domain-specific API tools and policy guidelines. We… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2406.11930  [pdf, other

    cs.SE cs.AI cs.CL

    A Critical Study of What Code-LLMs (Do Not) Learn

    Authors: Abhinav Anand, Shweta Verma, Krishna Narasimhan, Mira Mezini

    Abstract: Large Language Models trained on code corpora (code-LLMs) have demonstrated impressive performance in various coding assistance tasks. However, despite their increased size and training dataset, code-LLMs still have limitations such as suggesting codes with syntactic errors, variable misuse etc. Some studies argue that code-LLMs perform well on coding tasks because they use self-attention and hidd… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2405.15793  [pdf, other

    cs.SE cs.AI cs.CL cs.HC cs.LG

    SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

    Authors: John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press

    Abstract: Language model (LM) agents are increasingly being used to automate complicated tasks in digital environments. Just as humans benefit from powerful software applications, such as integrated development environments, for complex tasks like software engineering, we posit that LM agents represent a new category of end users with their own needs and abilities, and would benefit from specially-built int… ▽ More

    Submitted 30 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: Code, data, and demo available at https://swe-agent.com

  4. arXiv:2404.10952  [pdf, other

    cs.CL cs.AI cs.PL

    Can Language Models Solve Olympiad Programming?

    Authors: Quan Shi, Michael Tang, Karthik Narasimhan, Shunyu Yao

    Abstract: Computing olympiads contain some of the most challenging problems for humans, requiring complex algorithmic reasoning, puzzle solving, in addition to generating efficient code. However, it has been understudied as a domain to evaluate language models (LMs). In this paper, we introduce the USACO benchmark with 307 problems from the USA Computing Olympiad, along with high-quality unit tests, referen… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Code and data: https://princeton-nlp.github.io/USACOBench/

  5. arXiv:2404.08555  [pdf, other

    cs.LG cs.AI cs.CL

    RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs

    Authors: Shreyas Chaudhari, Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, Ameet Deshpande, Bruno Castro da Silva

    Abstract: State-of-the-art large language models (LLMs) have become indispensable tools for various tasks. However, training LLMs to serve as effective assistants for humans requires careful consideration. A promising approach is reinforcement learning from human feedback (RLHF), which leverages human feedback to update the model in accordance with human preferences and mitigate issues like toxicity and hal… ▽ More

    Submitted 15 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

  6. arXiv:2402.01695  [pdf, other

    cs.CL cs.AI cs.LG

    Language-Guided World Models: A Model-Based Approach to AI Control

    Authors: Alex Zhang, Khanh Nguyen, Jens Tuyls, Albert Lin, Karthik Narasimhan

    Abstract: Installing probabilistic world models into artificial agents opens an efficient channel for humans to communicate with and control these agents. In addition to updating agent policies, humans can modify their internal world models in order to influence their decisions. The challenge, however, is that currently existing world models are difficult for humans to adapt because they lack a natural comm… ▽ More

    Submitted 23 January, 2024; originally announced February 2024.

  7. Towards Trustworthy AI Software Development Assistance

    Authors: Daniel Maninger, Krishna Narasimhan, Mira Mezini

    Abstract: It is expected that in the near future, AI software development assistants will play an important role in the software industry. However, current software development assistants tend to be unreliable, often producing incorrect, unsafe, or low-quality code. We seek to resolve these issues by introducing a holistic architecture for constructing, training, and using trustworthy AI software developmen… ▽ More

    Submitted 23 January, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: 6 pages, 1 figure; to be published in New Ideas and Emerging Results (ICSE-NIER'24), April 14-20, 2024, Lisbon, Portugal; updated version to reflect the information provided by ACM

  8. arXiv:2311.09735  [pdf, other

    cs.LG cs.IR

    GEO: Generative Engine Optimization

    Authors: Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, Ameet Deshpande

    Abstract: The advent of large language models (LLMs) has ushered in a new paradigm of search engines that use generative models to gather and summarize information to answer user queries. This emerging technology, which we formalize under the unified framework of generative engines (GEs), can generate accurate and personalized responses, rapidly replacing traditional search engines like Google and Bing. Gen… ▽ More

    Submitted 28 June, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Accepted to KDD 2024

  9. arXiv:2311.02807  [pdf, other

    cs.LG cs.AI cs.CL

    QualEval: Qualitative Evaluation for Model Improvement

    Authors: Vishvak Murahari, Ameet Deshpande, Peter Clark, Tanmay Rajpurohit, Ashish Sabharwal, Karthik Narasimhan, Ashwin Kalyan

    Abstract: Quantitative evaluation metrics have traditionally been pivotal in gauging the advancements of artificial intelligence systems, including large language models (LLMs). However, these metrics have inherent limitations. Given the intricate nature of real-world tasks, a single scalar to quantify and compare is insufficient to capture the fine-grained nuances of model behavior. Metrics serve only as a… ▽ More

    Submitted 5 May, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  10. arXiv:2310.13004  [pdf, other

    cs.LG cs.AI cs.HC

    Progressively Efficient Learning

    Authors: Ruijie Zheng, Khanh Nguyen, Hal Daumé III, Furong Huang, Karthik Narasimhan

    Abstract: Assistant AI agents should be capable of rapidly acquiring novel skills and adapting to new user preferences. Traditional frameworks like imitation learning and reinforcement learning do not facilitate this capability because they support only low-level, inefficient forms of communication. In contrast, humans communicate with progressive efficiency by defining and sharing abstract intentions. Repr… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  11. arXiv:2310.06770  [pdf, other

    cs.CL cs.AI cs.SE

    SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

    Authors: Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, Karthik Narasimhan

    Abstract: Language models have outpaced our ability to evaluate them effectively, but for their future development it is essential to study the frontier of their capabilities. We find real-world software engineering to be a rich, sustainable, and challenging testbed for evaluating the next generation of language models. To this end, we introduce SWE-bench, an evaluation framework consisting of $2,294$ softw… ▽ More

    Submitted 5 April, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Data, code, and leaderboard are available at https://www.swebench.com ICLR 2024, https://openreview.net/forum?id=VTF8yNQM66

  12. arXiv:2310.05915  [pdf, other

    cs.CL cs.AI cs.LG

    FireAct: Toward Language Agent Fine-tuning

    Authors: Baian Chen, Chang Shu, Ehsan Shareghi, Nigel Collier, Karthik Narasimhan, Shunyu Yao

    Abstract: Recent efforts have augmented language models (LMs) with external tools or environments, leading to the development of language agents that can reason and act. However, most of these agents rely on few-shot prompting techniques with off-the-shelf LMs. In this paper, we investigate and argue for the overlooked direction of fine-tuning LMs to obtain language agents. Using a setup of question answeri… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: Code, data, and models are available at https://fireact-agent.github.io

  13. arXiv:2309.02427  [pdf, other

    cs.AI cs.CL cs.LG cs.SC

    Cognitive Architectures for Language Agents

    Authors: Theodore R. Sumers, Shunyu Yao, Karthik Narasimhan, Thomas L. Griffiths

    Abstract: Recent efforts have augmented large language models (LLMs) with external resources (e.g., the Internet) or internal control flows (e.g., prompt chaining) for tasks requiring grounding or reasoning, leading to a new class of language agents. While these agents have achieved substantial empirical success, we lack a systematic framework to organize existing agents and plan future developments. In thi… ▽ More

    Submitted 15 March, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: v3 is TMLR camera ready version. 19 pages of main content, 5 figures. The first two authors contributed equally, order decided by coin flip. A CoALA-based repo of recent work on language agents: https://github.com/ysymyth/awesome-language-agents

  14. arXiv:2307.09423  [pdf, other

    cs.LG cs.AI stat.ML

    Scaling Laws for Imitation Learning in Single-Agent Games

    Authors: Jens Tuyls, Dhruv Madeka, Kari Torkkola, Dean Foster, Karthik Narasimhan, Sham Kakade

    Abstract: Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, many works find it is often unable to fully recover the underlying expert behavior, even in constrained environments like single-agent games. However, none of these works deeply investigate the role of scaling up the model and data size. Inspired by recent work in Natural Language Processing (NLP) where "scali… ▽ More

    Submitted 10 March, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

  15. arXiv:2307.08689  [pdf, other

    cs.CL cs.AI cs.LG

    COLLIE: Systematic Construction of Constrained Text Generation Tasks

    Authors: Shunyu Yao, Howard Chen, Austin W. Hanjie, Runzhe Yang, Karthik Narasimhan

    Abstract: Text generation under constraints have seen increasing interests in natural language processing, especially with the rapidly improving capabilities of large language models. However, existing benchmarks for constrained generation usually focus on fixed constraint types (e.g.,generate a sentence containing certain words) that have proved to be easy for state-of-the-art models like GPT-4. We present… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: 18 pages, 12 figures

  16. arXiv:2307.00259  [pdf, other

    cs.CL cs.AI

    InstructEval: Systematic Evaluation of Instruction Selection Methods

    Authors: Anirudh Ajith, Chris Pan, Mengzhou Xia, Ameet Deshpande, Karthik Narasimhan

    Abstract: In-context learning (ICL) performs tasks by prompting a large language model (LLM) using an instruction and a small set of annotated examples called demonstrations. Recent work has shown that precise details of the inputs used in the ICL prompt significantly impact performance, which has incentivized instruction selection algorithms. The effect of instruction-choice however is severely underexplor… ▽ More

    Submitted 16 July, 2023; v1 submitted 1 July, 2023; originally announced July 2023.

    Comments: 8 content pages + 3 pages of supplementary material, 3 figures, 10 tables

  17. arXiv:2306.14898  [pdf, other

    cs.CL cs.LG cs.SE

    InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback

    Authors: John Yang, Akshara Prabhakar, Karthik Narasimhan, Shunyu Yao

    Abstract: Humans write code in a fundamentally interactive manner and rely on constant execution feedback to correct errors, resolve ambiguities, and decompose tasks. While LLMs have recently exhibited promising coding capabilities, current coding benchmarks mostly consider a static instruction-to-code sequence transduction process, which has the potential for error propagation and a disconnect between the… ▽ More

    Submitted 30 October, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

    Comments: Project site with code and data: https://intercode-benchmark.github.io

  18. arXiv:2305.15098  [pdf, other

    cs.CL

    Referral Augmentation for Zero-Shot Information Retrieval

    Authors: Michael Tang, Shunyu Yao, John Yang, Karthik Narasimhan

    Abstract: We propose Referral-Augmented Retrieval (RAR), a simple technique that concatenates document indices with referrals, i.e. text from other documents that cite or link to the given document, to provide significant performance gains for zero-shot information retrieval. The key insight behind our method is that referrals provide a more complete, multi-view representation of a document, much like incom… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  19. arXiv:2305.15093  [pdf, other

    cs.CL cs.AI cs.LG

    C-STS: Conditional Semantic Textual Similarity

    Authors: Ameet Deshpande, Carlos E. Jimenez, Howard Chen, Vishvak Murahari, Victoria Graf, Tanmay Rajpurohit, Ashwin Kalyan, Danqi Chen, Karthik Narasimhan

    Abstract: Semantic textual similarity (STS), a cornerstone task in NLP, measures the degree of similarity between a pair of sentences, and has broad application in fields such as information retrieval and natural language understanding. However, sentence similarity can be inherently ambiguous, depending on the specific aspect of interest. We resolve this ambiguity by proposing a novel task called Conditiona… ▽ More

    Submitted 6 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Published in EMNLP 2023

  20. arXiv:2305.14784  [pdf, other

    cs.AI cs.CL cs.CY cs.LG

    Anthropomorphization of AI: Opportunities and Risks

    Authors: Ameet Deshpande, Tanmay Rajpurohit, Karthik Narasimhan, Ashwin Kalyan

    Abstract: Anthropomorphization is the tendency to attribute human-like traits to non-human entities. It is prevalent in many social contexts -- children anthropomorphize toys, adults do so with brands, and it is a literary device. It is also a versatile tool in science, with behavioral psychology and evolutionary biology meticulously documenting its consequences. With widespread adoption of AI systems, and… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  21. arXiv:2305.14706  [pdf, other

    cs.LG cs.AI

    PruMUX: Augmenting Data Multiplexing with Model Compression

    Authors: Yushan Su, Vishvak Murahari, Karthik Narasimhan, Kai Li

    Abstract: As language models increase in size by the day, methods for efficient inference are critical to leveraging their capabilities for various applications. Prior work has investigated techniques like model pruning, knowledge distillation, and data multiplexing to increase model throughput without sacrificing accuracy. In this paper, we combine two such methods -- structured pruning and data multiplexi… ▽ More

    Submitted 23 August, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Published at Findings of the Association for Computational Linguistics (ACL 2023)

  22. arXiv:2305.11619  [pdf, other

    cs.SE cs.AI

    Towards Code Generation from BDD Test Case Specifications: A Vision

    Authors: Leon Chemnitz, David Reichenbach, Hani Aldebes, Mariam Naveed, Krishna Narasimhan, Mira Mezini

    Abstract: Automatic code generation has recently attracted large attention and is becoming more significant to the software development process. Solutions based on Machine Learning and Artificial Intelligence are being used to increase human and software efficiency in potent and innovative ways. In this paper, we aim to leverage these developments and introduce a novel approach to generating frontend compon… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted for publication at the International Conference on AI Engineering (CAIN) 2023

  23. arXiv:2305.10601  [pdf, other

    cs.CL cs.AI cs.LG

    Tree of Thoughts: Deliberate Problem Solving with Large Language Models

    Authors: Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan

    Abstract: Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for… ▽ More

    Submitted 3 December, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023 camera ready version. Code repo with all prompts: https://github.com/princeton-nlp/tree-of-thought-llm

  24. arXiv:2304.05335  [pdf, other

    cs.CL cs.AI cs.LG

    Toxicity in ChatGPT: Analyzing Persona-assigned Language Models

    Authors: Ameet Deshpande, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan

    Abstract: Large language models (LLMs) have shown incredible capabilities and transcended the natural language processing (NLP) community, with adoption throughout many services like healthcare, therapy, education, and customer service. Since users include people with critical information needs like students or patients engaging with chatbots, the safety of these systems is of prime importance. Therefore, a… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  25. arXiv:2303.11366  [pdf, other

    cs.AI cs.CL cs.LG

    Reflexion: Language Agents with Verbal Reinforcement Learning

    Authors: Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao

    Abstract: Large language models (LLMs) have been increasingly used to interact with external environments (e.g., games, compilers, APIs) as goal-driven agents. However, it remains challenging for these language agents to quickly and efficiently learn from trial-and-error as traditional reinforcement learning methods require extensive training samples and expensive model fine-tuning. We propose Reflexion, a… ▽ More

    Submitted 10 October, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: v4 contains a few additional experiments

  26. arXiv:2302.12441  [pdf, other

    cs.LG cs.CL

    MUX-PLMs: Data Multiplexing for High-throughput Language Models

    Authors: Vishvak Murahari, Ameet Deshpande, Carlos E. Jimenez, Izhak Shafran, Mingqiu Wang, Yuan Cao, Karthik Narasimhan

    Abstract: The widespread adoption of large language models such as ChatGPT and Bard has led to unprecedented demand for these technologies. The burgeoning cost of inference for ever-increasing model sizes coupled with hardware shortages has limited affordable access and poses a pressing need for efficiency approaches geared towards high throughput and performance. Multi-input multi-output (MIMO) algorithms… ▽ More

    Submitted 22 May, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

  27. arXiv:2301.11309  [pdf, other

    cs.CL

    SemSup-XC: Semantic Supervision for Zero and Few-shot Extreme Classification

    Authors: Pranjal Aggarwal, Ameet Deshpande, Karthik Narasimhan

    Abstract: Extreme classification (XC) involves predicting over large numbers of classes (thousands to millions), with real-world applications like news article classification and e-commerce product tagging. The zero-shot version of this task requires generalization to novel classes without additional supervision. In this paper, we develop SemSup-XC, a model that achieves state-of-the-art zero-shot and few-s… ▽ More

    Submitted 22 June, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: Published at ICML 2023. V2: camera ready version at ICML 2023

  28. arXiv:2301.06866  [pdf, other

    cs.CV

    Building Scalable Video Understanding Benchmarks through Sports

    Authors: Aniket Agarwal, Alex Zhang, Karthik Narasimhan, Igor Gilitschenski, Vishvak Murahari, Yash Kant

    Abstract: Existing benchmarks for evaluating long video understanding falls short on two critical aspects, either lacking in scale or quality of annotations. These limitations arise from the difficulty in collecting dense annotations for long videos, which often require manually labeling each frame. In this work, we introduce an automated Annotation and Video Stream Alignment Pipeline (abbreviated ASAP). We… ▽ More

    Submitted 26 March, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

  29. arXiv:2212.10466  [pdf, other

    cs.CL

    Controllable Text Generation with Language Constraints

    Authors: Howard Chen, Huihan Li, Danqi Chen, Karthik Narasimhan

    Abstract: We consider the task of text generation in language models with constraints specified in natural language. To this end, we first create a challenging benchmark Cognac that provides as input to the model a topic with example text, along with a constraint on text to be avoided. Unlike prior work, our benchmark contains knowledge-intensive constraints sourced from databases like Wordnet and Wikidata,… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

  30. arXiv:2211.16634  [pdf, other

    cs.CL cs.AI cs.LG

    SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers

    Authors: Ameet Deshpande, Md Arafat Sultan, Anthony Ferritto, Ashwin Kalyan, Karthik Narasimhan, Avirup Sil

    Abstract: Fine-tuning pre-trained language models (PLMs) achieves impressive performance on a range of downstream tasks, and their sizes have consequently been getting bigger. Since a different copy of the model is required for each task, this paradigm is infeasible for storage-constrained edge devices like mobile phones. In this paper, we propose SPARTAN, a parameter efficient (PE) and computationally fast… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

  31. arXiv:2211.08547  [pdf, other

    cs.CL cs.AI cs.LG

    ALIGN-MLM: Word Embedding Alignment is Crucial for Multilingual Pre-training

    Authors: Henry Tang, Ameet Deshpande, Karthik Narasimhan

    Abstract: Multilingual pre-trained models exhibit zero-shot cross-lingual transfer, where a model fine-tuned on a source language achieves surprisingly good performance on a target language. While studies have attempted to understand transfer, they focus only on MLM, and the large number of differences between natural languages makes it hard to disentangle the importance of different properties. In this wor… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

  32. arXiv:2210.03629  [pdf, other

    cs.CL cs.AI cs.LG

    ReAct: Synergizing Reasoning and Acting in Language Models

    Authors: Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao

    Abstract: While large language models (LLMs) have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics. In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific acti… ▽ More

    Submitted 9 March, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: v3 is the ICLR camera ready version with some typos fixed. Project site with code: https://react-lm.github.io

  33. arXiv:2209.11103  [pdf, other

    cs.CR cs.SE

    To Fix or Not to Fix: A Critical Study of Crypto-misuses in the Wild

    Authors: Anna-Katharina Wickert, Lars Baumgärtner, Michael Schlichtig, Krishna Narasimhan, Mira Mezini

    Abstract: Recent studies have revealed that 87 % to 96 % of the Android apps using cryptographic APIs have a misuse which may cause security vulnerabilities. As previous studies did not conduct a qualitative examination of the validity and severity of the findings, our objective was to understand the findings in more depth. We analyzed a set of 936 open-source Java applications for cryptographic misuses. Ou… ▽ More

    Submitted 24 March, 2023; v1 submitted 22 September, 2022; originally announced September 2022.

    Comments: 8 pages, published in 2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), December 09-11, 2022, Wuhan, China

  34. arXiv:2207.01206  [pdf, other

    cs.CL cs.AI cs.LG

    WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents

    Authors: Shunyu Yao, Howard Chen, John Yang, Karthik Narasimhan

    Abstract: Existing benchmarks for grounding language in interactive environments either lack real-world linguistic elements, or prove difficult to scale up due to substantial human involvement in the collection of data or feedback signals. To bridge this gap, we develop WebShop -- a simulated e-commerce website environment with $1.18$ million real-world products and $12,087$ crowd-sourced text instructions.… ▽ More

    Submitted 7 February, 2023; v1 submitted 4 July, 2022; originally announced July 2022.

    Comments: Project page with code, data, demos: https://webshop-pnlp.github.io. v3 is NeurIPS camera ready version. v4 fixes the choice oracle result as per https://github.com/princeton-nlp/WebShop/issues/15

  35. arXiv:2206.13074  [pdf, other

    cs.RO cs.AI cs.LG

    Leveraging Language for Accelerated Learning of Tool Manipulation

    Authors: Allen Z. Ren, Bharat Govil, Tsung-Yen Yang, Karthik Narasimhan, Anirudha Majumdar

    Abstract: Robust and generalized tool manipulation requires an understanding of the properties and affordances of different tools. We investigate whether linguistic information about a tool (e.g., its geometry, common uses) can help control policies adapt faster to new tools for a given task. We obtain diverse descriptions of various tools in natural language and use pre-trained language models to generate… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

  36. arXiv:2205.11558  [pdf, other

    cs.AI

    Using Natural Language and Program Abstractions to Instill Human Inductive Biases in Machines

    Authors: Sreejan Kumar, Carlos G. Correa, Ishita Dasgupta, Raja Marjieh, Michael Y. Hu, Robert D. Hawkins, Nathaniel D. Daw, Jonathan D. Cohen, Karthik Narasimhan, Thomas L. Griffiths

    Abstract: Strong inductive biases give humans the ability to quickly learn to perform a variety of tasks. Although meta-learning is a method to endow neural networks with useful inductive biases, agents trained by meta-learning may sometimes acquire very different strategies from humans. We show that co-training these agents on predicting representations from natural language task descriptions and programs… ▽ More

    Submitted 5 February, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: In Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022), winner of Outstanding Paper Award

  37. arXiv:2204.11790  [pdf, other

    cs.CL cs.CR cs.LG

    Can Rationalization Improve Robustness?

    Authors: Howard Chen, Jacqueline He, Karthik Narasimhan, Danqi Chen

    Abstract: A growing line of work has investigated the development of neural NLP models that can produce rationales--subsets of input that can explain their model predictions. In this paper, we ask whether such rationale models can also provide robustness to adversarial attacks in addition to their interpretable nature. Since these models need to first generate rationales ("rationalizer") before making predi… ▽ More

    Submitted 3 May, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

    Comments: Accepted to NAACL 2022; The code is available at https://github.com/princeton-nlp/rationale-robustness

  38. arXiv:2203.13344  [pdf, other

    cs.CL cs.AI cs.LG

    Linking Emergent and Natural Languages via Corpus Transfer

    Authors: Shunyu Yao, Mo Yu, Yang Zhang, Karthik R Narasimhan, Joshua B. Tenenbaum, Chuang Gan

    Abstract: The study of language emergence aims to understand how human languages are shaped by perceptual grounding and communicative intent. Computational approaches to emergent communication (EC) predominantly consider referential games in limited domains and analyze the learned protocol within the game framework. As a result, it remains unclear how the emergent languages from these settings connect to na… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

    Comments: ICLR 2022 Spotlight. Github repo: https://github.com/ysymyth/ec-nl

  39. arXiv:2203.07613  [pdf, other

    cs.CL cs.CV

    CARETS: A Consistency And Robustness Evaluative Test Suite for VQA

    Authors: Carlos E. Jimenez, Olga Russakovsky, Karthik Narasimhan

    Abstract: We introduce CARETS, a systematic test suite to measure consistency and robustness of modern VQA models through a series of six fine-grained capability tests. In contrast to existing VQA test sets, CARETS features balanced question generation to create pairs of instances to test models, with each pair focusing on a specific capability such as rephrasing, logical symmetry or image obfuscation. We e… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: ACL 2022

  40. arXiv:2202.13100  [pdf, other

    cs.LG cs.CL cs.CV

    SemSup: Semantic Supervision for Simple and Scalable Zero-shot Generalization

    Authors: Austin W. Hanjie, Ameet Deshpande, Karthik Narasimhan

    Abstract: Zero-shot learning is the problem of predicting instances over classes not seen during training. One approach to zero-shot learning is providing auxiliary class information to the model. Prior work along this vein have largely used expensive per-instance annotation or singular class-level descriptions, but per-instance descriptions are hard to scale and single class descriptions may not be rich en… ▽ More

    Submitted 30 January, 2023; v1 submitted 26 February, 2022; originally announced February 2022.

  41. arXiv:2202.09318  [pdf, other

    cs.LG cs.AI

    DataMUX: Data Multiplexing for Neural Networks

    Authors: Vishvak Murahari, Carlos E. Jimenez, Runzhe Yang, Karthik Narasimhan

    Abstract: In this paper, we introduce data multiplexing (DataMUX), a technique that enables deep neural networks to process multiple inputs simultaneously using a single compact representation. DataMUX demonstrates that neural networks are capable of generating accurate predictions over mixtures of inputs, resulting in increased throughput with minimal extra memory requirements. Our approach uses two key co… ▽ More

    Submitted 14 November, 2022; v1 submitted 18 February, 2022; originally announced February 2022.

    Comments: NeurIPS 2022

  42. arXiv:2201.03639  [pdf, other

    cs.CV

    Multi-Query Video Retrieval

    Authors: Zeyu Wang, Yu Wu, Karthik Narasimhan, Olga Russakovsky

    Abstract: Retrieving target videos based on text descriptions is a task of great practical value and has received increasing attention over the past few years. Despite recent progress, imperfect annotations in existing video retrieval datasets have posed significant challenges on model evaluation and development. In this paper, we tackle this issue by focusing on the less-studied setting of multi-query vide… ▽ More

    Submitted 20 July, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

    Comments: ECCV 2022

  43. arXiv:2201.01251  [pdf, other

    cs.CL

    Multi-Stage Episodic Control for Strategic Exploration in Text Games

    Authors: Jens Tuyls, Shunyu Yao, Sham Kakade, Karthik Narasimhan

    Abstract: Text adventure games present unique challenges to reinforcement learning methods due to their combinatorially large action spaces and sparse rewards. The interplay of these two factors is particularly demanding because large action spaces require extensive exploration, while sparse rewards provide limited feedback. This work proposes to tackle the explore-vs-exploit dilemma using a multi-stage app… ▽ More

    Submitted 15 March, 2022; v1 submitted 4 January, 2022; originally announced January 2022.

    Comments: ICLR 2022 (Spotlight) - https://sites.google.com/princeton.edu/xtx

  44. arXiv:2110.14782  [pdf, other

    cs.CL cs.LG

    When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer

    Authors: Ameet Deshpande, Partha Talukdar, Karthik Narasimhan

    Abstract: While recent work on multilingual language models has demonstrated their capacity for cross-lingual zero-shot transfer on downstream tasks, there is a lack of consensus in the community as to what shared properties between languages enable such transfer. Analyses involving pairs of natural languages are often inconclusive and contradictory since languages simultaneously differ in many linguistic a… ▽ More

    Submitted 3 May, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: Accepted at NAACL 2022

  45. arXiv:2110.10661  [pdf, other

    cs.CL cs.AI cs.LG

    SILG: The Multi-environment Symbolic Interactive Language Grounding Benchmark

    Authors: Victor Zhong, Austin W. Hanjie, Sida I. Wang, Karthik Narasimhan, Luke Zettlemoyer

    Abstract: Existing work in language grounding typically study single environments. How do we build unified models that apply across multiple environments? We propose the multi-environment Symbolic Interactive Language Grounding benchmark (SILG), which unifies a collection of diverse grounded language learning environments under a common interface. SILG consists of grid-world environments that require genera… ▽ More

    Submitted 24 January, 2022; v1 submitted 20 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021. 14 pages, 8 figures

  46. arXiv:2109.06613  [pdf, other

    cs.CR cs.SE

    Exploring the Use of Static and Dynamic Analysis to Improve the Performance of the Mining Sandbox Approach for Android Malware Identification

    Authors: Francisco Handrick da Costa, Ismael Medeiros, Thales Menezes, João Victor da Silva, Ingrid Lorraine da Silva, Rodrigo Bonifácio, Krishna Narasimhan, Márcio Ribeiro

    Abstract: The Android mining sandbox approach consists in running dynamic analysis tools on a benign version of an Android app and recording every call to sensitive APIs. Later, one can use this information to (a) prevent calls to other sensitive APIs (those not previously recorded) or (b) run the dynamic analysis tools again in a different version of the app -- in order to identify possible malicious behav… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: 31 pages, 6 figures. Paper accepted for publication in The Journal of Systems & Software

  47. arXiv:2108.09245  [pdf, other

    cs.SE

    Fex: Assisted Identification of Domain Features from C Programs

    Authors: Patrick Müller, Krishna Narasimhan, Mira Mezini

    Abstract: Modern software typically performs more than one functionality. These functionalities or features are not always organized in a way for modules representing these features to be used individually. Many software engineering approaches like programming language constructs, or product line visualization techniques have been proposed to organize projects as modules. Unfortunately, much legacy software… ▽ More

    Submitted 20 August, 2021; originally announced August 2021.

  48. arXiv:2106.14347  [pdf, other

    cs.DC cs.LG

    Revelio: ML-Generated Debugging Queries for Distributed Systems

    Authors: Pradeep Dogga, Karthik Narasimhan, Anirudh Sivaraman, Shiv Kumar Saini, George Varghese, Ravi Netravali

    Abstract: A major difficulty in debugging distributed systems lies in manually determining which of the many available debugging tools to use and how to query its logs. Our own study of a production debugging workflow confirms the magnitude of this burden. This paper explores whether a machine-learning model can assist developers in distributed systems debugging. We present Revelio, a debugging assistant wh… ▽ More

    Submitted 27 June, 2021; originally announced June 2021.

  49. arXiv:2105.11115  [pdf, other

    cs.CL cs.AI cs.FL

    Self-Attention Networks Can Process Bounded Hierarchical Languages

    Authors: Shunyu Yao, Binghui Peng, Christos Papadimitriou, Karthik Narasimhan

    Abstract: Despite their impressive performance in NLP, self-attention networks were recently proved to be limited for processing formal languages with hierarchical structure, such as $\mathsf{Dyck}_k$, the language consisting of well-nested parentheses of $k$ types. This suggested that natural language can be approximated well with models that are too weak for formal languages, or that the role of hierarchy… ▽ More

    Submitted 12 March, 2023; v1 submitted 24 May, 2021; originally announced May 2021.

    Comments: ACL 2021. 19 pages with extended appendix. Fixed a small typo in the formula at the end of page 5 (thank to Gabriel Faria). Code: https://github.com/princeton-nlp/dyck-transformer

  50. arXiv:2105.04950  [pdf, other

    cs.CR cs.SE

    Dealing with Variability in API Misuse Specification

    Authors: Rodrigo Bonifacio, Stefan Krüger, Krishna Narasimhan, Eric Bodden, Mira Mezini

    Abstract: APIs are the primary mechanism for developers to gain access to externally defined services and tools. However, previous research has revealed API misuses that violate the contract of APIs to be prevalent. Such misuses can have harmful consequences, especially in the context of cryptographic libraries. Various API misuse detectors have been proposed to address this issue including CogniCrypt, one… ▽ More

    Submitted 17 May, 2021; v1 submitted 11 May, 2021; originally announced May 2021.

    Comments: 28 pages, 16 figures

    MSC Class: 68N19 ACM Class: D.2.1; D.3.3