Skip to main content

Showing 1–11 of 11 results for author: Cabrera, A A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.11444  [pdf, other

    cs.CL cs.AI

    An In-depth Look at Gemini's Language Abilities

    Authors: Syeda Nahida Akter, Zichun Yu, Aashiq Muhamed, Tianyue Ou, Alex Bäuerle, Ángel Alexander Cabrera, Krish Dholakia, Chenyan Xiong, Graham Neubig

    Abstract: The recently released Google Gemini class of models are the first to comprehensively report results that rival the OpenAI GPT series across a wide variety of tasks. In this paper, we do an in-depth exploration of Gemini's language abilities, making two contributions. First, we provide a third-party, objective comparison of the abilities of the OpenAI GPT and Google Gemini models with reproducible… ▽ More

    Submitted 24 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

  2. Where Does My Model Underperform? A Human Evaluation of Slice Discovery Algorithms

    Authors: Nari Johnson, Ángel Alexander Cabrera, Gregory Plumb, Ameet Talwalkar

    Abstract: Machine learning (ML) models that achieve high average accuracy can still underperform on semantically coherent subsets ("slices") of data. This behavior can have significant societal consequences for the safety or bias of the model in deployment, but identifying these underperforming slices can be difficult in practice, especially in domains where practitioners lack access to group annotations to… ▽ More

    Submitted 9 February, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 11(1), 65-76. Best Paper Award

  3. Zeno: An Interactive Framework for Behavioral Evaluation of Machine Learning

    Authors: Ángel Alexander Cabrera, Erica Fu, Donald Bertucci, Kenneth Holstein, Ameet Talwalkar, Jason I. Hong, Adam Perer

    Abstract: Machine learning models with high accuracy on test data can still produce systematic failures, such as harmful biases and safety issues, when deployed in the real world. To detect and mitigate such failures, practitioners run behavioral evaluation of their models, checking model outputs for specific types of inputs. Behavioral evaluation is important but challenging, requiring that practitioners d… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

  4. arXiv:2301.06937  [pdf, other

    cs.HC cs.AI

    Improving Human-AI Collaboration With Descriptions of AI Behavior

    Authors: Ángel Alexander Cabrera, Adam Perer, Jason I. Hong

    Abstract: People work with AI systems to improve their decision making, but often under- or over-rely on AI predictions and perform worse than they would have unassisted. To help people appropriately rely on AI aids, we propose showing them behavior descriptions, details of how AI systems perform on subgroups of instances. We tested the efficacy of behavior descriptions through user studies with 225 partici… ▽ More

    Submitted 5 January, 2023; originally announced January 2023.

    Comments: 21 pages

    Journal ref: Proc. ACM Hum.-Comput. Interact. 7, CSCW1, Article 136 (April 2023)

  5. arXiv:2207.04104  [pdf, other

    cs.LG cs.CV

    Towards a More Rigorous Science of Blindspot Discovery in Image Classification Models

    Authors: Gregory Plumb, Nari Johnson, Ángel Alexander Cabrera, Ameet Talwalkar

    Abstract: A growing body of work studies Blindspot Discovery Methods ("BDM"s): methods that use an image embedding to find semantically meaningful (i.e., united by a human-understandable concept) subsets of the data where an image classifier performs significantly worse. Motivated by observed gaps in prior work, we introduce a new framework for evaluating BDMs, SpotCheck, that uses synthetic image datasets… ▽ More

    Submitted 11 July, 2023; v1 submitted 8 July, 2022; originally announced July 2022.

    Comments: reviewed on OpenReview: https://openreview.net/forum?id=MaDvbLaBiF

    Journal ref: TMLR 2023

  6. arXiv:2204.10814  [pdf, ps, other

    cs.HC cs.AI

    "Public(s)-in-the-Loop": Facilitating Deliberation of Algorithmic Decisions in Contentious Public Policy Domains

    Authors: Hong Shen, Ángel Alexander Cabrera, Adam Perer, Jason Hong

    Abstract: This position paper offers a framework to think about how to better involve human influence in algorithmic decision-making of contentious public policy issues. Drawing from insights in communication literature, we introduce a "public(s)-in-the-loop" approach and enumerates three features that are central to this approach: publics as plural political entities, collective decision-making through del… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

    Comments: 5 pages, 0 figure, accepted to CHI2020 Fair & Responsible AI Workshop

  7. arXiv:2202.08946  [pdf, other

    cs.HC cs.AI cs.LG

    Symphony: Composing Interactive Interfaces for Machine Learning

    Authors: Alex Bäuerle, Ángel Alexander Cabrera, Fred Hohman, Megan Maher, David Koski, Xavier Suau, Titus Barik, Dominik Moritz

    Abstract: Interfaces for machine learning (ML), information and visualizations about models or data, can help practitioners build robust and responsible ML systems. Despite their benefits, recent studies of ML teams and our interviews with practitioners (n=9) showed that ML interfaces have limited adoption in practice. While existing ML interfaces are effective for specific tasks, they are not designed to b… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

    Comments: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems

    ACM Class: H.2.m; I.7.m

  8. arXiv:2109.11690  [pdf, other

    cs.HC cs.LG

    Discovering and Validating AI Errors With Crowdsourced Failure Reports

    Authors: Ángel Alexander Cabrera, Abraham J. Druck, Jason I. Hong, Adam Perer

    Abstract: AI systems can fail to learn important behaviors, leading to real-world issues like safety concerns and biases. Discovering these systematic failures often requires significant developer attention, from hypothesizing potential edge cases to collecting evidence and validating patterns. To scale and streamline this process, we introduce crowdsourced failure reports, end-user descriptions of how or w… ▽ More

    Submitted 23 September, 2021; originally announced September 2021.

  9. FairVis: Visual Analytics for Discovering Intersectional Bias in Machine Learning

    Authors: Ángel Alexander Cabrera, Will Epperson, Fred Hohman, Minsuk Kahng, Jamie Morgenstern, Duen Horng Chau

    Abstract: The growing capability and accessibility of machine learning has led to its application to many real-world domains and data about people. Despite the benefits algorithmic systems may bring, models can reflect, inject, or exacerbate implicit and explicit societal biases into their outputs, disadvantaging certain demographic subgroups. Discovering which biases a machine learning model has introduced… ▽ More

    Submitted 1 September, 2019; v1 submitted 10 April, 2019; originally announced April 2019.

    Comments: Accepted as a VAST conference paper to IEEE VIS'19

  10. arXiv:1902.06787  [pdf, other

    cs.LG stat.ML

    Regularizing Black-box Models for Improved Interpretability

    Authors: Gregory Plumb, Maruan Al-Shedivat, Angel Alexander Cabrera, Adam Perer, Eric Xing, Ameet Talwalkar

    Abstract: Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, whose explanation quality can be unpredictable. Our method, ExpO, is a hybridization of these approaches that regularizes a model for explanation quality at training time. Importantly, these reg… ▽ More

    Submitted 8 November, 2020; v1 submitted 18 February, 2019; originally announced February 2019.

  11. arXiv:1806.05660  [pdf, other

    cs.CV cs.AI cs.HC

    Interactive Classification for Deep Learning Interpretation

    Authors: Ángel Alexander Cabrera, Fred Hohman, Jason Lin, Duen Horng Chau

    Abstract: We present an interactive system enabling users to manipulate images to explore the robustness and sensitivity of deep learning image classifiers. Using modern web technologies to run in-browser inference, users can remove image features using inpainting algorithms and obtain new classifications in real time, which allows them to ask a variety of "what if" questions by experimentally modifying ima… ▽ More

    Submitted 10 April, 2019; v1 submitted 14 June, 2018; originally announced June 2018.

    Comments: Presented as a demo at CVPR'18