Skip to main content

Showing 1–4 of 4 results for author: Shinn, N

.
  1. arXiv:2406.12045  [pdf, other

    cs.AI cs.CL

    $τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

    Authors: Shunyu Yao, Noah Shinn, Pedram Razavi, Karthik Narasimhan

    Abstract: Existing benchmarks do not test language agents on their interaction with human users or ability to follow domain-specific rules, both of which are vital for deploying them in real world applications. We propose $τ$-bench, a benchmark emulating dynamic conversations between a user (simulated by language models) and a language agent provided with domain-specific API tools and policy guidelines. We… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2312.12450  [pdf, other

    cs.SE cs.AI cs.LG cs.PL

    Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions

    Authors: Federico Cassano, Luisa Li, Akul Sethi, Noah Shinn, Abby Brennan-Jones, Jacob Ginesin, Edward Berman, George Chakhnashvili, Anton Lozhkov, Carolyn Jane Anderson, Arjun Guha

    Abstract: A significant amount of research is focused on develo** and evaluating large language models for a variety of code synthesis tasks. These include synthesizing code from natural language, synthesizing tests from code, and synthesizing explanations of code. In contrast, the behavior of instructional code editing with LLMs is understudied. These are tasks in which the model is provided a block of c… ▽ More

    Submitted 19 March, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

  3. arXiv:2305.17145  [pdf, other

    cs.SE cs.LG cs.PL

    Type Prediction With Program Decomposition and Fill-in-the-Type Training

    Authors: Federico Cassano, Ming-Ho Yee, Noah Shinn, Arjun Guha, Steven Holtzen

    Abstract: TypeScript and Python are two programming languages that support optional type annotations, which are useful but tedious to introduce and maintain. This has motivated automated type prediction: given an untyped program, produce a well-typed output program. Large language models (LLMs) are promising for type prediction, but there are challenges: fill-in-the-middle performs poorly, programs may not… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  4. arXiv:2303.11366  [pdf, other

    cs.AI cs.CL cs.LG

    Reflexion: Language Agents with Verbal Reinforcement Learning

    Authors: Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao

    Abstract: Large language models (LLMs) have been increasingly used to interact with external environments (e.g., games, compilers, APIs) as goal-driven agents. However, it remains challenging for these language agents to quickly and efficiently learn from trial-and-error as traditional reinforcement learning methods require extensive training samples and expensive model fine-tuning. We propose Reflexion, a… ▽ More

    Submitted 10 October, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: v4 contains a few additional experiments