Showing 1–2 of 2 results for author: Bentham, O

Search v0.5.6 released 2020-02-24

arXiv:2407.04965 [pdf, other]

cs.CL

Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression

Authors: Zhichao Xu, Ashim Gupta, Tao Li, Oliver Bentham, Vivek Srikumar

Abstract: Large language models (LLMs) are increasingly deployed in real-world scenarios with the help of recent model compression techniques. Such momentum towards local deployment means the use of compressed LLMs will widely impact a large population. However, prior analysis works often prioritize on preserving perplexity which is a direct analogy to training loss. The impact of compression method on othe… ▽ More Large language models (LLMs) are increasingly deployed in real-world scenarios with the help of recent model compression techniques. Such momentum towards local deployment means the use of compressed LLMs will widely impact a large population. However, prior analysis works often prioritize on preserving perplexity which is a direct analogy to training loss. The impact of compression method on other critical aspects of model behavior, particularly safety, still calls for a systematic assessment. To this end, we investigate the impact of model compression on four dimensions: 1) degeneration harm, i.e., bias and toxicity in generation; 2) representational harm, i.e., biases in discriminative tasks; 3) dialect bias; 4) language modeling and downstream task performance. We cover a wide spectrum of LLM compression techniques, including structured pruning, un/semi-structured ones, and quantization. Our analyses reveal that compression can lead to unexpected consequences. Although compression may unintentionally remedy LLMs' degeneration harm, it can still exacerbate on the representational harm axis. Moreover, there is a divergent impact on different protected groups as the compression rate grows. Finally, different compression methods have drastically different safety impacts, e.g., quantization mostly preserves bias while pruning degrades quickly. Our findings underscore the importance of integrating safety assessments into the development of compressed LLMs to ensure their reliability across real-world applications. Our full results are available here: \url{https://github.com/zhichaoxu-shufe/Beyond-Perplexity-Compression-Safety-Eval} △ Less

Submitted 6 July, 2024; originally announced July 2024.
arXiv:2402.14897 [pdf, ps, other]

cs.CL cs.AI cs.LG

Chain-of-Thought Unfaithfulness as Disguised Accuracy

Authors: Oliver Bentham, Nathan Stringham, Ana Marasović

Abstract: Understanding the extent to which Chain-of-Thought (CoT) generations align with a large language model's (LLM) internal computations is critical for deciding whether to trust an LLM's output. As a proxy for CoT faithfulness, Lanham et al. (2023) propose a metric that measures a model's dependence on its CoT for producing an answer. Within a single family of proprietary models, they find that LLMs… ▽ More Understanding the extent to which Chain-of-Thought (CoT) generations align with a large language model's (LLM) internal computations is critical for deciding whether to trust an LLM's output. As a proxy for CoT faithfulness, Lanham et al. (2023) propose a metric that measures a model's dependence on its CoT for producing an answer. Within a single family of proprietary models, they find that LLMs exhibit a scaling-then-inverse-scaling relationship between model size and their measure of faithfulness, and that a 13 billion parameter model exhibits increased faithfulness compared to models ranging from 810 million to 175 billion parameters in size. We evaluate whether these results generalize as a property of all LLMs. We replicate the experimental setup in their section focused on scaling experiments with three different families of models and, under specific conditions, successfully reproduce the scaling trends for CoT faithfulness they report. However, after normalizing the metric to account for a model's bias toward certain answer choices, unfaithfulness drops significantly for smaller less-capable models. This normalized faithfulness metric is also strongly correlated ($R^2$=0.74) with accuracy, raising doubts about its validity for evaluating faithfulness. △ Less

Submitted 21 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: TMLR accepted paper camera-ready version. First two authors contributed equally. 8 pages main, 13 pages appendix

Search v0.5.6 released 2020-02-24