Skip to main content

Showing 1–11 of 11 results for author: Koenecke, A

Searching in archive cs. Search in all archives.
.
  1. Careless Whisper: Speech-to-Text Hallucination Harms

    Authors: Allison Koenecke, Anna Seo Gyeong Choi, Katelyn X. Mei, Hilke Schellmann, Mona Sloane

    Abstract: Speech-to-text services aim to transcribe input audio as accurately as possible. They increasingly play a role in everyday life, for example in personal voice assistants or in customer-company interactions. We evaluate Open AI's Whisper, a state-of-the-art automated speech recognition service outperforming industry competitors, as of 2023. While many of Whisper's transcriptions were highly accurat… ▽ More

    Submitted 2 May, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  2. arXiv:2306.11839  [pdf, other

    stat.ME cs.LG stat.AP stat.ML

    Should I Stop or Should I Go: Early Stop** with Heterogeneous Populations

    Authors: Hammaad Adam, Fan Yin, Huibin, Hu, Neil Tenenholtz, Lorin Crawford, Lester Mackey, Allison Koenecke

    Abstract: Randomized experiments often need to be stopped prematurely due to the treatment having an unintended harmful effect. Existing methods that determine when to stop an experiment early are typically applied to the data in aggregate and do not account for treatment effect heterogeneity. In this paper, we study the early stop** of experiments for harm on heterogeneous populations. We first establish… ▽ More

    Submitted 27 October, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 (spotlight)

  3. Auditing Cross-Cultural Consistency of Human-Annotated Labels for Recommendation Systems

    Authors: Rock Yuren Pang, Jack Cenatempo, Franklyn Graham, Bridgette Kuehn, Maddy Whisenant, Portia Botchway, Katie Stone Perez, Allison Koenecke

    Abstract: Recommendation systems increasingly depend on massive human-labeled datasets; however, the human annotators hired to generate these labels increasingly come from homogeneous backgrounds. This poses an issue when downstream predictive models -- based on these labels -- are applied globally to a heterogeneous set of users. We study this disconnect with respect to the labels themselves, asking whethe… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: Accepted at FAccT 2023

  4. Augmented Datasheets for Speech Datasets and Ethical Decision-Making

    Authors: Orestis Papakyriakopoulos, Anna Seo Gyeong Choi, Jerone Andrews, Rebecca Bourke, William Thong, Dora Zhao, Alice Xiang, Allison Koenecke

    Abstract: Speech datasets are crucial for training Speech Language Technologies (SLT); however, the lack of diversity of the underlying training data can lead to serious limitations in building equitable and robust SLT products, especially along dimensions of language, accent, dialect, variety, and speech impairment - and the intersectionality of speech features with socioeconomic and demographic features.… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: To appear in 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT '23), June 12-15, Chicago, IL, USA

  5. arXiv:2304.08530  [pdf, other

    cs.CY

    Popular Support for Balancing Equity and Efficiency in Resource Allocation: A Case Study in Online Advertising to Increase Welfare Program Awareness

    Authors: Allison Koenecke, Eric Giannella, Robb Willer, Sharad Goel

    Abstract: Algorithmically optimizing the provision of limited resources is commonplace across domains from healthcare to lending. Optimization can lead to efficient resource allocation, but, if deployed without additional scrutiny, can also exacerbate inequality. Little is known about popular preferences regarding acceptable efficiency-equity trade-offs, making it difficult to design algorithms that are res… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: This paper will be presented at the 2023 International Conference on Web and Social Media (ICWSM'23)

  6. Potential for allocative harm in an environmental justice data tool

    Authors: Benjamin Q. Huynh, Elizabeth T. Chin, Allison Koenecke, Derek Ouyang, Daniel E. Ho, Mathew V. Kiang, David H. Rehkopf

    Abstract: Neighborhood-level screening algorithms are increasingly being deployed to inform policy decisions. We evaluate one such algorithm, CalEnviroScreen - designed to promote environmental justice and used to guide hundreds of millions of dollars in public funding annually - assessing its potential for allocative harm. We observe the model to be sensitive to subjective model decisions, with 16% of trac… ▽ More

    Submitted 12 April, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

    Journal ref: Nat Mach Intell 6, 187-194 (2024)

  7. arXiv:2205.07333  [pdf, other

    cs.HC cs.CV

    Trucks Don't Mean Trump: Diagnosing Human Error in Image Analysis

    Authors: J. D. Zamfirescu-Pereira, Jerry Chen, Emily Wen, Allison Koenecke, Nikhil Garg, Emma Pierson

    Abstract: Algorithms provide powerful tools for detecting and dissecting human bias and error. Here, we develop machine learning methods to to analyze how humans err in a particular high-stakes task: image interpretation. We leverage a unique dataset of 16,135,392 human predictions of whether a neighborhood voted for Donald Trump or Joe Biden in the 2020 US election, based on a Google Street View image. We… ▽ More

    Submitted 15 May, 2022; originally announced May 2022.

    Comments: To be published in FAccT 2022

  8. arXiv:2107.11732  [pdf, other

    cs.LG econ.EM q-bio.QM stat.ME

    Federated Causal Inference in Heterogeneous Observational Data

    Authors: Ruoxuan Xiong, Allison Koenecke, Michael Powell, Zhu Shen, Joshua T. Vogelstein, Susan Athey

    Abstract: We are interested in estimating the effect of a treatment applied to individuals at multiple sites, where data is stored locally for each site. Due to privacy constraints, individual-level data cannot be shared across sites; the sites may also have heterogeneous populations and treatment assignment mechanisms. Motivated by these considerations, we develop federated methods to draw inference on the… ▽ More

    Submitted 2 April, 2023; v1 submitted 25 July, 2021; originally announced July 2021.

  9. arXiv:2011.01374  [pdf, ps, other

    econ.GN cs.LG

    Synthetic Data Generation for Economists

    Authors: Allison Koenecke, Hal Varian

    Abstract: As more tech companies engage in rigorous economic analyses, we are confronted with a data problem: in-house papers cannot be replicated due to use of sensitive, proprietary, or private data. Readers are left to assume that the obscured true data (e.g., internal Google information) indeed produced the results given, or they must seek out comparable public-facing data (e.g., Google Trends) that yie… ▽ More

    Submitted 6 November, 2020; v1 submitted 2 November, 2020; originally announced November 2020.

  10. arXiv:1904.12887  [pdf, other

    cs.LG q-fin.GN stat.ML

    Curriculum Learning in Deep Neural Networks for Financial Forecasting

    Authors: Allison Koenecke, Amita Gajewar

    Abstract: For any financial organization, computing accurate quarterly forecasts for various products is one of the most critical operations. As the granularity at which forecasts are needed increases, traditional statistical time series models may not scale well. We apply deep neural networks in the forecasting domain by experimenting with techniques from Natural Language Processing (Encoder-Decoder LSTMs)… ▽ More

    Submitted 23 July, 2019; v1 submitted 29 April, 2019; originally announced April 2019.

  11. Learning Twitter User Sentiments on Climate Change with Limited Labeled Data

    Authors: Allison Koenecke, Jordi Feliu-FabĂ 

    Abstract: While it is well-documented that climate change accepters and deniers have become increasingly polarized in the United States over time, there has been no large-scale examination of whether these individuals are prone to changing their opinions as a result of natural external occurrences. On the sub-population of Twitter users, we examine whether climate change sentiment changes in response to fiv… ▽ More

    Submitted 15 April, 2019; originally announced April 2019.