Skip to main content

Showing 1–1 of 1 results for author: Tam, Z R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08747  [pdf, other

    cs.CL

    StreamBench: Towards Benchmarking Continuous Improvement of Language Agents

    Authors: Cheng-Kuang Wu, Zhi Rui Tam, Chieh-Yen Lin, Yun-Nung Chen, Hung-yi Lee

    Abstract: Recent works have shown that large language model (LLM) agents are able to improve themselves from experience, which is an important ability for continuous enhancement post-deployment. However, existing benchmarks primarily evaluate their innate capabilities and do not assess their ability to improve over time. To address this gap, we introduce StreamBench, a pioneering benchmark designed to evalu… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.