Skip to main content

Showing 1–2 of 2 results for author: Kannappan, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.11944  [pdf, other

    cs.CL cs.AI cs.CE stat.ML

    FinanceBench: A New Benchmark for Financial Question Answering

    Authors: Pranab Islam, Anand Kannappan, Douwe Kiela, Rebecca Qian, Nino Scherrer, Bertie Vidgen

    Abstract: FinanceBench is a first-of-its-kind test suite for evaluating the performance of LLMs on open book financial question answering (QA). It comprises 10,231 questions about publicly traded companies, with corresponding answers and evidence strings. The questions in FinanceBench are ecologically valid and cover a diverse set of scenarios. They are intended to be clear-cut and straightforward to answer… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: Dataset is available at: https://huggingface.co/datasets/PatronusAI/financebench

  2. arXiv:2311.08370  [pdf, other

    cs.CL

    SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

    Authors: Bertie Vidgen, Nino Scherrer, Hannah Rose Kirk, Rebecca Qian, Anand Kannappan, Scott A. Hale, Paul Röttger

    Abstract: The past year has seen rapid acceleration in the development of large language models (LLMs). However, without proper steering and safeguards, LLMs will readily follow malicious instructions, provide unsafe advice, and generate toxic content. We introduce SimpleSafetyTests (SST) as a new test suite for rapidly and systematically identifying such critical safety risks. The test suite comprises 100… ▽ More

    Submitted 16 February, 2024; v1 submitted 14 November, 2023; originally announced November 2023.