Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation

Fisch, Adam; Maynez, Joshua; Hofer, R. Alex; Dhingra, Bhuwan; Globerson, Amir; Cohen, William W.

Computer Science > Machine Learning

arXiv:2406.04291 (cs)

[Submitted on 6 Jun 2024]

Title:Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation

Authors:Adam Fisch, Joshua Maynez, R. Alex Hofer, Bhuwan Dhingra, Amir Globerson, William W. Cohen

View PDF HTML (experimental)

Abstract:Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data. PPI achieves this by combining small amounts of human-labeled data with larger amounts of data labeled by a reasonably accurate -- but potentially biased -- automatic system, in a way that results in tighter confidence intervals for certain parameters of interest (e.g., the mean performance of a language model). In this paper, we propose a method called Stratified Prediction-Powered Inference (StratPPI), in which we show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies. Without making any assumptions on the underlying automatic labeling system or data distribution, we derive an algorithm for computing provably valid confidence intervals for population parameters (such as averages) that is based on stratified sampling. In particular, we show both theoretically and empirically that, with appropriate choices of stratification and sample allocation, our approach can provide substantially tighter confidence intervals than unstratified approaches. Specifically, StratPPI is expected to improve in cases where the performance of the autorater varies across different conditional distributions of the target data.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2406.04291 [cs.LG]
	(or arXiv:2406.04291v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.04291

Submission history

From: Adam Fisch [view email]
[v1] Thu, 6 Jun 2024 17:37:39 UTC (5,824 KB)

Computer Science > Machine Learning

Title:Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators