-
Between Randomness and Arbitrariness: Some Lessons for Reliable Machine Learning at Scale
Authors:
A. Feder Cooper
Abstract:
To develop rigorous knowledge about ML models -- and the systems in which they are embedded -- we need reliable measurements. But reliable measurement is fundamentally challenging, and touches on issues of reproducibility, scalability, uncertainty quantification, epistemology, and more. This dissertation addresses criteria needed to take reliability seriously: both criteria for designing meaningfu…
▽ More
To develop rigorous knowledge about ML models -- and the systems in which they are embedded -- we need reliable measurements. But reliable measurement is fundamentally challenging, and touches on issues of reproducibility, scalability, uncertainty quantification, epistemology, and more. This dissertation addresses criteria needed to take reliability seriously: both criteria for designing meaningful metrics, and for methodologies that ensure that we can dependably and efficiently measure these metrics at scale and in practice. In doing so, this dissertation articulates a research vision for a new field of scholarship at the intersection of machine learning, law, and policy. Within this frame, we cover topics that fit under three different themes: (1) quantifying and mitigating sources of arbitrariness in ML, (2) taming randomness in uncertainty estimation and optimization algorithms, in order to achieve scalability without sacrificing reliability, and (3) providing methods for evaluating generative-AI systems, with specific focuses on quantifying memorization in language models and training latent diffusion models on open-licensed data. By making contributions in these three themes, this dissertation serves as an empirical proof by example that research on reliable measurement for machine learning is intimately and inescapably bound up with research in law and policy. These different disciplines pose similar research questions about reliable measurement in machine learning. They are, in fact, two complementary sides of the same research vision, which, broadly construed, aims to construct machine-learning systems that cohere with broader societal values.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
The Files are in the Computer: Copyright, Memorization, and Generative-AI Systems
Authors:
A. Feder Cooper,
James Grimmelmann
Abstract:
A central issue in copyright lawsuits against companies that produce generative-AI systems is the degree to which a generative-AI model does or does not "memorize" the data it was trained on. Unfortunately, the debate has been clouded by ambiguity over what "memorization" is, leading to legal debates in which participants often talk past one another. In this Essay, we attempt to bring clarity to t…
▽ More
A central issue in copyright lawsuits against companies that produce generative-AI systems is the degree to which a generative-AI model does or does not "memorize" the data it was trained on. Unfortunately, the debate has been clouded by ambiguity over what "memorization" is, leading to legal debates in which participants often talk past one another. In this Essay, we attempt to bring clarity to the conversation over memorization and its relationship to copying that is cognizable by U.S. copyright law.
△ Less
Submitted 2 July, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
Stealing Part of a Production Language Model
Authors:
Nicholas Carlini,
Daniel Paleka,
Krishnamurthy Dj Dvijotham,
Thomas Steinke,
Jonathan Hayase,
A. Feder Cooper,
Katherine Lee,
Matthew Jagielski,
Milad Nasr,
Arthur Conmy,
Eric Wallace,
David Rolnick,
Florian Tramèr
Abstract:
We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \…
▽ More
We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \$20 USD, our attack extracts the entire projection matrix of OpenAI's Ada and Babbage language models. We thereby confirm, for the first time, that these black-box models have a hidden dimension of 1024 and 2048, respectively. We also recover the exact hidden dimension size of the gpt-3.5-turbo model, and estimate it would cost under \$2,000 in queries to recover the entire projection matrix. We conclude with potential defenses and mitigations, and discuss the implications of possible future work that could extend our attack.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
On the Standardization of Behavioral Use Clauses and Their Adoption for Responsible Licensing of AI
Authors:
Daniel McDuff,
Tim Korjakow,
Scott Cambo,
Jesse Josua Benjamin,
Jenny Lee,
Yacine Jernite,
Carlos Muñoz Ferrandis,
Aaron Gokaslan,
Alek Tarkowski,
Joseph Lindley,
A. Feder Cooper,
Danish Contractor
Abstract:
Growing concerns over negligent or malicious uses of AI have increased the appetite for tools that help manage the risks of the technology. In 2018, licenses with behaviorial-use clauses (commonly referred to as Responsible AI Licenses) were proposed to give developers a framework for releasing AI assets while specifying their users to mitigate negative applications. As of the end of 2023, on the…
▽ More
Growing concerns over negligent or malicious uses of AI have increased the appetite for tools that help manage the risks of the technology. In 2018, licenses with behaviorial-use clauses (commonly referred to as Responsible AI Licenses) were proposed to give developers a framework for releasing AI assets while specifying their users to mitigate negative applications. As of the end of 2023, on the order of 40,000 software and model repositories have adopted responsible AI licenses licenses. Notable models licensed with behavioral use clauses include BLOOM (language) and LLaMA2 (language), Stable Diffusion (image), and GRID (robotics). This paper explores why and how these licenses have been adopted, and why and how they have been adapted to fit particular use cases. We use a mixed-methods methodology of qualitative interviews, clustering of license clauses, and quantitative analysis of license adoption. Based on this evidence we take the position that responsible AI licenses need standardization to avoid confusing users or diluting their impact. At the same time, customization of behavioral restrictions is also appropriate in some contexts (e.g., medical domains). We advocate for ``standardized customization'' that can meet users' needs and can be supported via tooling.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Scalable Extraction of Training Data from (Production) Language Models
Authors:
Milad Nasr,
Nicholas Carlini,
Jonathan Hayase,
Matthew Jagielski,
A. Feder Cooper,
Daphne Ippolito,
Christopher A. Choquette-Choo,
Eric Wallace,
Florian Tramèr,
Katherine Lee
Abstract:
This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset. We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT. Existing techniques from…
▽ More
This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset. We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT. Existing techniques from the literature suffice to attack unaligned models; in order to attack the aligned ChatGPT, we develop a new divergence attack that causes the model to diverge from its chatbot-style generations and emit training data at a rate 150x higher than when behaving properly. Our methods show practical attacks can recover far more data than previously thought, and reveal that current alignment techniques do not eliminate memorization.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
Report of the 1st Workshop on Generative AI and Law
Authors:
A. Feder Cooper,
Katherine Lee,
James Grimmelmann,
Daphne Ippolito,
Christopher Callison-Burch,
Christopher A. Choquette-Choo,
Niloofar Mireshghallah,
Miles Brundage,
David Mimno,
Madiha Zahrah Choksi,
Jack M. Balkin,
Nicholas Carlini,
Christopher De Sa,
Jonathan Frankle,
Deep Ganguli,
Bryant Gipson,
Andres Guadamuz,
Swee Leng Harris,
Abigail Z. Jacobs,
Elizabeth Joh,
Gautam Kamath,
Mark Lemley,
Cass Matthews,
Christine McLeavey,
Corynne McSherry
, et al. (10 additional authors not shown)
Abstract:
This report presents the takeaways of the inaugural Workshop on Generative AI and Law (GenLaw), held in July 2023. A cross-disciplinary group of practitioners and scholars from computer science and law convened to discuss the technical, doctrinal, and policy challenges presented by law for Generative AI, and by Generative AI for law, with an emphasis on U.S. law in particular. We begin the report…
▽ More
This report presents the takeaways of the inaugural Workshop on Generative AI and Law (GenLaw), held in July 2023. A cross-disciplinary group of practitioners and scholars from computer science and law convened to discuss the technical, doctrinal, and policy challenges presented by law for Generative AI, and by Generative AI for law, with an emphasis on U.S. law in particular. We begin the report with a high-level statement about why Generative AI is both immensely significant and immensely challenging for law. To meet these challenges, we conclude that there is an essential need for 1) a shared knowledge base that provides a common conceptual language for experts across disciplines; 2) clarification of the distinctive technical capabilities of generative-AI systems, as compared and contrasted to other computer and AI systems; 3) a logical taxonomy of the legal issues these systems raise; and, 4) a concrete research agenda to promote collaboration and knowledge-sharing on emerging issues at the intersection of Generative AI and law. In this report, we synthesize the key takeaways from the GenLaw workshop that begin to address these needs. All of the listed authors contributed to the workshop upon which this report is based, but they and their organizations do not necessarily endorse all of the specific claims in this report.
△ Less
Submitted 2 December, 2023; v1 submitted 10 November, 2023;
originally announced November 2023.
-
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
Authors:
Aaron Gokaslan,
A. Feder Cooper,
Jasmine Collins,
Landan Seguin,
Austin Jacobson,
Mihir Patel,
Jonathan Frankle,
Cory Stephenson,
Volodymyr Kuleshov
Abstract:
We assemble a dataset of Creative-Commons-licensed (CC) images, which we use to train a set of open diffusion models that are qualitatively competitive with Stable Diffusion 2 (SD2). This task presents two challenges: (1) high-resolution CC images lack the captions necessary to train text-to-image generative models; (2) CC images are relatively scarce. In turn, to address these challenges, we use…
▽ More
We assemble a dataset of Creative-Commons-licensed (CC) images, which we use to train a set of open diffusion models that are qualitatively competitive with Stable Diffusion 2 (SD2). This task presents two challenges: (1) high-resolution CC images lack the captions necessary to train text-to-image generative models; (2) CC images are relatively scarce. In turn, to address these challenges, we use an intuitive transfer learning technique to produce a set of high-quality synthetic captions paired with curated CC images. We then develop a data- and compute-efficient training recipe that requires as little as 3% of the LAION-2B data needed to train existing SD2 models, but obtains comparable quality. These results indicate that we have a sufficient number of CC images (~70 million) for training high-quality models. Our training recipe also implements a variety of optimizations that achieve ~3X training speed-ups, enabling rapid model iteration. We leverage this recipe to train several high-quality text-to-image models, which we dub the CommonCanvas family. Our largest model achieves comparable performance to SD2 on a human evaluation, despite being trained on our CC dataset that is significantly smaller than LAION and using synthetic captions for training. We release our models, data, and code at https://github.com/mosaicml/diffusion/blob/main/assets/common-canvas.md
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Talkin' 'Bout AI Generation: Copyright and the Generative-AI Supply Chain
Authors:
Katherine Lee,
A. Feder Cooper,
James Grimmelmann
Abstract:
"Does generative AI infringe copyright?" is an urgent question. It is also a difficult question, for two reasons. First, "generative AI" is not just one product from one company. It is a catch-all name for a massive ecosystem of loosely related technologies, including conversational text chatbots like ChatGPT, image generators like Midjourney and DALL-E, coding assistants like GitHub Copilot, and…
▽ More
"Does generative AI infringe copyright?" is an urgent question. It is also a difficult question, for two reasons. First, "generative AI" is not just one product from one company. It is a catch-all name for a massive ecosystem of loosely related technologies, including conversational text chatbots like ChatGPT, image generators like Midjourney and DALL-E, coding assistants like GitHub Copilot, and systems that compose music and create videos. These systems behave differently and raise different legal issues. The second problem is that copyright law is notoriously complicated, and generative-AI systems manage to touch on a great many corners of it: authorship, similarity, direct and indirect liability, fair use, and licensing, among much else. These issues cannot be analyzed in isolation, because there are connections everywhere.
In this Article, we aim to bring order to the chaos. To do so, we introduce the generative-AI supply chain: an interconnected set of stages that transform training data (millions of pictures of cats) into generations (a new, potentially never-seen-before picture of a cat that has never existed). Breaking down generative AI into these constituent stages reveals all of the places at which companies and users make choices that have copyright consequences. It enables us to trace the effects of upstream technical designs on downstream uses, and to assess who in these complicated sociotechnical systems bears responsibility for infringement when it happens. Because we engage so closely with the technology of generative AI, we are able to shed more light on the copyright questions. We do not give definitive answers as to who should and should not be held liable. Instead, we identify the key decisions that courts will need to make as they grapple with these issues, and point out the consequences that would likely flow from different liability regimes.
△ Less
Submitted 1 March, 2024; v1 submitted 15 September, 2023;
originally announced September 2023.
-
Coordinating Distributed Example Orders for Provably Accelerated Training
Authors:
A. Feder Cooper,
Wentao Guo,
Khiem Pham,
Tiancheng Yuan,
Charlie F. Ruan,
Yucheng Lu,
Christopher De Sa
Abstract:
Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings for SGD that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR. However, GraB is limited by design: whil…
▽ More
Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings for SGD that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR. However, GraB is limited by design: while it demonstrates an impressive ability to scale-up training on centralized data, it does not naturally extend to modern distributed ML workloads. We therefore propose Coordinated Distributed GraB (CD-GraB), which uses insights from prior work on kernel thinning to translate the benefits of provably faster permutation-based example ordering to distributed settings. With negligible overhead, CD-GraB exhibits a linear speedup in convergence rate over centralized GraB and outperforms distributed RR on a variety of benchmark tasks.
△ Less
Submitted 21 December, 2023; v1 submitted 1 February, 2023;
originally announced February 2023.
-
Arbitrariness and Social Prediction: The Confounding Role of Variance in Fair Classification
Authors:
A. Feder Cooper,
Katherine Lee,
Madiha Zahrah Choksi,
Solon Barocas,
Christopher De Sa,
James Grimmelmann,
Jon Kleinberg,
Siddhartha Sen,
Baobao Zhang
Abstract:
Variance in predictions across different trained models is a significant, under-explored source of error in fair binary classification. In practice, the variance on some data examples is so large that decisions can be effectively arbitrary. To investigate this problem, we take an experimental approach and make four overarching contributions: We: 1) Define a metric called self-consistency, derived…
▽ More
Variance in predictions across different trained models is a significant, under-explored source of error in fair binary classification. In practice, the variance on some data examples is so large that decisions can be effectively arbitrary. To investigate this problem, we take an experimental approach and make four overarching contributions: We: 1) Define a metric called self-consistency, derived from variance, which we use as a proxy for measuring and reducing arbitrariness; 2) Develop an ensembling algorithm that abstains from classification when a prediction would be arbitrary; 3) Conduct the largest to-date empirical study of the role of variance (vis-a-vis self-consistency and arbitrariness) in fair binary classification; and, 4) Release a toolkit that makes the US Home Mortgage Disclosure Act (HMDA) datasets easily usable for future research. Altogether, our experiments reveal shocking insights about the reliability of conclusions on benchmark datasets. Most fair binary classification benchmarks are close-to-fair when taking into account the amount of arbitrariness present in predictions -- before we even try to apply any fairness interventions. This finding calls into question the practical utility of common algorithmic fairness methods, and in turn suggests that we should reconsider how we choose to measure fairness in binary classification.
△ Less
Submitted 6 March, 2024; v1 submitted 27 January, 2023;
originally announced January 2023.
-
Fast or Accurate? Governing Conflicting Goals in Highly Autonomous Vehicles
Authors:
A. Feder Cooper,
Karen Levy
Abstract:
The tremendous excitement around the deployment of autonomous vehicles (AVs) comes from their purported promise. In addition to decreasing accidents, AVs are projected to usher in a new era of equity in human autonomy by providing affordable, accessible, and widespread mobility for disabled, elderly, and low-income populations. However, to realize this promise, it is necessary to ensure that AVs a…
▽ More
The tremendous excitement around the deployment of autonomous vehicles (AVs) comes from their purported promise. In addition to decreasing accidents, AVs are projected to usher in a new era of equity in human autonomy by providing affordable, accessible, and widespread mobility for disabled, elderly, and low-income populations. However, to realize this promise, it is necessary to ensure that AVs are safe for deployment, and to contend with the risks AV technology poses, which threaten to eclipse its benefits. In this Article, we focus on an aspect of AV engineering currently unexamined in the legal literature, but with critical implications for safety, accountability, liability, and power. Specifically, we explain how understanding the fundamental engineering trade-off between accuracy and speed in AVs is critical for policymakers to regulate the uncertainty and risk inherent in AV systems. We discuss how understanding the trade-off will help create tools that will enable policymakers to assess how the trade-off is being implemented. Such tools will facilitate opportunities for develo** concrete, ex ante AV safety standards and conclusive mechanisms for ex post determination of accountability after accidents occur. This will shift the balance of power from manufacturers to the public by facilitating effective regulation, reducing barriers to tort recovery, and ensuring that public values like safety and accountability are appropriately balanced.
△ Less
Submitted 3 August, 2022;
originally announced August 2022.
-
Non-Determinism and the Lawlessness of Machine Learning Code
Authors:
A. Feder Cooper,
Jonathan Frankle,
Christopher De Sa
Abstract:
Legal literature on machine learning (ML) tends to focus on harms, and thus tends to reason about individual model outcomes and summary error rates. This focus has masked important aspects of ML that are rooted in its reliance on randomness -- namely, stochasticity and non-determinism. While some recent work has begun to reason about the relationship between stochasticity and arbitrariness in lega…
▽ More
Legal literature on machine learning (ML) tends to focus on harms, and thus tends to reason about individual model outcomes and summary error rates. This focus has masked important aspects of ML that are rooted in its reliance on randomness -- namely, stochasticity and non-determinism. While some recent work has begun to reason about the relationship between stochasticity and arbitrariness in legal contexts, the role of non-determinism more broadly remains unexamined. In this paper, we clarify the overlap and differences between these two concepts, and show that the effects of non-determinism, and consequently its implications for the law, become clearer from the perspective of reasoning about ML outputs as distributions over possible outcomes. This distributional viewpoint accounts for randomness by emphasizing the possible outcomes of ML. Importantly, this type of reasoning is not exclusive with current legal reasoning; it complements (and in fact can strengthen) analyses concerning individual, concrete outcomes for specific automated decisions. By illuminating the important role of non-determinism, we demonstrate that ML code falls outside of the cyberlaw frame of treating ``code as law,'' as this frame assumes that code is deterministic. We conclude with a brief discussion of what work ML can do to constrain the potentially harm-inducing effects of non-determinism, and we indicate where the law must do work to bridge the gap between its current individual-outcome focus and the distributional approach that we recommend.
△ Less
Submitted 5 October, 2022; v1 submitted 23 June, 2022;
originally announced June 2022.
-
Four Years of FAccT: A Reflexive, Mixed-Methods Analysis of Research Contributions, Shortcomings, and Future Prospects
Authors:
Benjamin Laufer,
Sameer Jain,
A. Feder Cooper,
Jon Kleinberg,
Hoda Heidari
Abstract:
Fairness, Accountability, and Transparency (FAccT) for socio-technical systems has been a thriving area of research in recent years. An ACM conference bearing the same name has been the central venue for scholars in this area to come together, provide peer feedback to one another, and publish their work. This reflexive study aims to shed light on FAccT's activities to date and identify major gaps…
▽ More
Fairness, Accountability, and Transparency (FAccT) for socio-technical systems has been a thriving area of research in recent years. An ACM conference bearing the same name has been the central venue for scholars in this area to come together, provide peer feedback to one another, and publish their work. This reflexive study aims to shed light on FAccT's activities to date and identify major gaps and opportunities for translating contributions into broader positive impact. To this end, we utilize a mixed-methods research design. On the qualitative front, we develop a protocol for reviewing and coding prior FAccT papers, tracing their distribution of topics, methods, datasets, and disciplinary roots. We also design and administer a questionnaire to reflect the voices of FAccT community members and affiliates on a wide range of topics. On the quantitative front, we use the full text and citation network associated with prior FAccT publications to provide further evidence about topics and values represented in FAccT. We organize the findings from our analysis into four main dimensions: the themes present in FAccT scholarship, the values that underpin the work, the impact of the contributions both within academic circles and beyond, and the practices and informal norms of the community that has formed around FAccT. Finally, our work identifies several suggestions on directions for change, as voiced by community members.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
Repairing Regressors for Fair Binary Classification at Any Decision Threshold
Authors:
Kweku Kwegyir-Aggrey,
A. Feder Cooper,
Jessica Dai,
John Dickerson,
Keegan Hines,
Suresh Venkatasubramanian
Abstract:
We study the problem of post-processing a supervised machine-learned regressor to maximize fair binary classification at all decision thresholds. By decreasing the statistical distance between each group's score distributions, we show that we can increase fair performance across all thresholds at once, and that we can do so without a large decrease in accuracy. To this end, we introduce a formal m…
▽ More
We study the problem of post-processing a supervised machine-learned regressor to maximize fair binary classification at all decision thresholds. By decreasing the statistical distance between each group's score distributions, we show that we can increase fair performance across all thresholds at once, and that we can do so without a large decrease in accuracy. To this end, we introduce a formal measure of Distributional Parity, which captures the degree of similarity in the distributions of classifications for different protected groups. Our main result is to put forward a novel post-processing algorithm based on optimal transport, which provably maximizes Distributional Parity, thereby attaining common notions of group fairness like Equalized Odds or Equal Opportunity at all thresholds. We demonstrate on two fairness benchmarks that our technique works well empirically, while also outperforming and generalizing similar techniques from related work.
△ Less
Submitted 10 December, 2023; v1 submitted 14 March, 2022;
originally announced March 2022.
-
Accountability in an Algorithmic Society: Relationality, Responsibility, and Robustness in Machine Learning
Authors:
A. Feder Cooper,
Emanuel Moss,
Benjamin Laufer,
Helen Nissenbaum
Abstract:
In 1996, Accountability in a Computerized Society [95] issued a clarion call concerning the erosion of accountability in society due to the ubiquitous delegation of consequential functions to computerized systems. Nissenbaum [95] described four barriers to accountability that computerization presented, which we revisit in relation to the ascendance of data-driven algorithmic systems--i.e., machine…
▽ More
In 1996, Accountability in a Computerized Society [95] issued a clarion call concerning the erosion of accountability in society due to the ubiquitous delegation of consequential functions to computerized systems. Nissenbaum [95] described four barriers to accountability that computerization presented, which we revisit in relation to the ascendance of data-driven algorithmic systems--i.e., machine learning or artificial intelligence--to uncover new challenges for accountability that these systems present. Nissenbaum's original paper grounded discussion of the barriers in moral philosophy; we bring this analysis together with recent scholarship on relational accountability frameworks and discuss how the barriers present difficulties for instantiating a unified moral, relational framework in practice for data-driven algorithmic systems. We conclude by discussing ways of weakening the barriers in order to do so.
△ Less
Submitted 13 May, 2022; v1 submitted 10 February, 2022;
originally announced February 2022.
-
Making the Unaccountable Internet: The Changing Meaning of Accounting in the Early ARPANET
Authors:
A. Feder Cooper,
Gili Vidan
Abstract:
Contemporary concerns over the governance of technological systems often run up against narratives about the technical infeasibility of designing mechanisms for accountability. While in recent AI ethics literature these concerns have been deliberated predominantly in relation to ML, other instances in computing history also presented circumstances in which computer scientists needed to un-muddle w…
▽ More
Contemporary concerns over the governance of technological systems often run up against narratives about the technical infeasibility of designing mechanisms for accountability. While in recent AI ethics literature these concerns have been deliberated predominantly in relation to ML, other instances in computing history also presented circumstances in which computer scientists needed to un-muddle what it means to design accountable systems. One such compelling narrative can be found in canonical histories of the Internet that highlight how its original designers' commitment to the "End-to-End" architectural principle precluded other features from being implemented, resulting in the fast-growing, generative, but ultimately unaccountable network we have today. This paper offers a critique of such technologically essentialist notions of accountability and the characterization of the "unaccountable Internet" as an unintended consequence. It explores the changing meaning of accounting and its relationship to accountability in a selected corpus of requests for comments (RFCs) concerning the early Internet's design from the 1970s and 80s. We characterize four ways of conceptualizing accounting: as billing, as measurement, as management, and as policy, and demonstrate how an understanding of accountability was constituted through these shifting meanings. We link together the administrative and technical mechanisms of accounting for shared resources in a distributed system and an emerging notion of accountability as a social, political, and technical category, arguing that the former is constitutive of the latter. Recovering this history is not only important for understanding the processes that shaped the Internet, but also serves as a starting point for unpacking the complicated political choices that are involved in designing accountability mechanisms for other technological systems today.
△ Less
Submitted 11 May, 2022; v1 submitted 27 January, 2022;
originally announced January 2022.
-
Tecnologica cosa: Modeling Storyteller Personalities in Boccaccio's Decameron
Authors:
A. Feder Cooper,
Maria Antoniak,
Christopher De Sa,
Marilyn Migiel,
David Mimno
Abstract:
We explore Boccaccio's Decameron to see how digital humanities tools can be used for tasks that have limited data in a language no longer in contemporary use: medieval Italian. We focus our analysis on the question: Do the different storytellers in the text exhibit distinct personalities? To answer this question, we curate and release a dataset based on the authoritative edition of the text. We us…
▽ More
We explore Boccaccio's Decameron to see how digital humanities tools can be used for tasks that have limited data in a language no longer in contemporary use: medieval Italian. We focus our analysis on the question: Do the different storytellers in the text exhibit distinct personalities? To answer this question, we curate and release a dataset based on the authoritative edition of the text. We use supervised classification methods to predict storytellers based on the stories they tell, confirming the difficulty of the task, and demonstrate that topic modeling can extract thematic storyteller "profiles."
△ Less
Submitted 21 September, 2021;
originally announced September 2021.
-
Model Selection's Disparate Impact in Real-World Deep Learning Applications
Authors:
Jessica Zosa Forde,
A. Feder Cooper,
Kweku Kwegyir-Aggrey,
Chris De Sa,
Michael Littman
Abstract:
Algorithmic fairness has emphasized the role of biased data in automated decision outcomes. Recently, there has been a shift in attention to sources of bias that implicate fairness in other stages in the ML pipeline. We contend that one source of such bias, human preferences in model selection, remains under-explored in terms of its role in disparate impact across demographic groups. Using a deep…
▽ More
Algorithmic fairness has emphasized the role of biased data in automated decision outcomes. Recently, there has been a shift in attention to sources of bias that implicate fairness in other stages in the ML pipeline. We contend that one source of such bias, human preferences in model selection, remains under-explored in terms of its role in disparate impact across demographic groups. Using a deep learning model trained on real-world medical imaging data, we verify our claim empirically and argue that choice of metric for model comparison, especially those that do not take variability into account, can significantly bias model selection outcomes.
△ Less
Submitted 7 September, 2021; v1 submitted 1 April, 2021;
originally announced April 2021.
-
Hyperparameter Optimization Is Deceiving Us, and How to Stop It
Authors:
A. Feder Cooper,
Yucheng Lu,
Jessica Zosa Forde,
Christopher De Sa
Abstract:
Recent empirical work shows that inconsistent results based on choice of hyperparameter optimization (HPO) configuration are a widespread problem in ML research. When comparing two algorithms J and K searching one subspace can yield the conclusion that J outperforms K, whereas searching another can entail the opposite. In short, the way we choose hyperparameters can deceive us. We provide a theore…
▽ More
Recent empirical work shows that inconsistent results based on choice of hyperparameter optimization (HPO) configuration are a widespread problem in ML research. When comparing two algorithms J and K searching one subspace can yield the conclusion that J outperforms K, whereas searching another can entail the opposite. In short, the way we choose hyperparameters can deceive us. We provide a theoretical complement to this prior work, arguing that, to avoid such deception, the process of drawing conclusions from HPO should be made more rigorous. We call this process epistemic hyperparameter optimization (EHPO), and put forth a logical framework to capture its semantics and how it can lead to inconsistent conclusions about performance. Our framework enables us to prove EHPO methods that are guaranteed to be defended against deception, given bounded compute time budget t. We demonstrate our framework's utility by proving and empirically validating a defended variant of random search.
△ Less
Submitted 25 October, 2021; v1 submitted 5 February, 2021;
originally announced February 2021.
-
Emergent Unfairness in Algorithmic Fairness-Accuracy Trade-Off Research
Authors:
A. Feder Cooper,
Ellen Abrams
Abstract:
Across machine learning (ML) sub-disciplines, researchers make explicit mathematical assumptions in order to facilitate proof-writing. We note that, specifically in the area of fairness-accuracy trade-off optimization scholarship, similar attention is not paid to the normative assumptions that ground this approach. Such assumptions presume that 1) accuracy and fairness are in inherent opposition t…
▽ More
Across machine learning (ML) sub-disciplines, researchers make explicit mathematical assumptions in order to facilitate proof-writing. We note that, specifically in the area of fairness-accuracy trade-off optimization scholarship, similar attention is not paid to the normative assumptions that ground this approach. Such assumptions presume that 1) accuracy and fairness are in inherent opposition to one another, 2) strict notions of mathematical equality can adequately model fairness, 3) it is possible to measure the accuracy and fairness of decisions independent from historical context, and 4) collecting more data on marginalized individuals is a reasonable solution to mitigate the effects of the trade-off. We argue that such assumptions, which are often left implicit and unexamined, lead to inconsistent conclusions: While the intended goal of this work may be to improve the fairness of machine learning models, these unexamined, implicit assumptions can in fact result in emergent unfairness. We conclude by suggesting a concrete path forward toward a potential resolution.
△ Less
Submitted 7 September, 2021; v1 submitted 1 February, 2021;
originally announced February 2021.
-
Where Is the Normative Proof? Assumptions and Contradictions in ML Fairness Research
Authors:
A. Feder Cooper
Abstract:
Across machine learning (ML) sub-disciplines researchers make mathematical assumptions to facilitate proof-writing. While such assumptions are necessary for providing mathematical guarantees for how algorithms behave, they also necessarily limit the applicability of these algorithms to different problem settings. This practice is known--in fact, obvious--and accepted in ML research. However, simil…
▽ More
Across machine learning (ML) sub-disciplines researchers make mathematical assumptions to facilitate proof-writing. While such assumptions are necessary for providing mathematical guarantees for how algorithms behave, they also necessarily limit the applicability of these algorithms to different problem settings. This practice is known--in fact, obvious--and accepted in ML research. However, similar attention is not paid to the normative assumptions that ground this work. I argue such assumptions are equally as important, especially in areas of ML with clear social impact, such as fairness. This is because, similar to how mathematical assumptions constrain applicability, normative assumptions also limit algorithm applicability to certain problem domains. I show that, in existing papers published in top venues, once normative assumptions are clarified, it is often possible to get unclear or contradictory results. While the mathematical assumptions and results are sound, the implicit normative assumptions and accompanying normative results contraindicate using these methods in practical fairness applications.
△ Less
Submitted 3 November, 2020; v1 submitted 20 October, 2020;
originally announced October 2020.
-
Accuracy-Efficiency Trade-Offs and Accountability in Distributed ML Systems
Authors:
A. Feder Cooper,
Karen Levy,
Christopher De Sa
Abstract:
Trade-offs between accuracy and efficiency pervade law, public health, and other non-computing domains, which have developed policies to guide how to balance the two in conditions of uncertainty. While computer science also commonly studies accuracy-efficiency trade-offs, their policy implications remain poorly examined. Drawing on risk assessment practices in the US, we argue that, since examinin…
▽ More
Trade-offs between accuracy and efficiency pervade law, public health, and other non-computing domains, which have developed policies to guide how to balance the two in conditions of uncertainty. While computer science also commonly studies accuracy-efficiency trade-offs, their policy implications remain poorly examined. Drawing on risk assessment practices in the US, we argue that, since examining these trade-offs has been useful for guiding governance in other domains, we need to similarly reckon with these trade-offs in governing computer systems. We focus our analysis on distributed machine learning systems. Understanding the policy implications in this area is particularly urgent because such systems, which include autonomous vehicles, tend to be high-stakes and safety-critical. We 1) describe how the trade-off takes shape for these systems, 2) highlight gaps between existing US risk assessment standards and what these systems require to be properly assessed, and 3) make specific calls to action to facilitate accountability when hypothetical risks concerning the accuracy-efficiency trade-off become realized as accidents in the real world. We close by discussing how such accountability mechanisms encourage more just, transparent governance aligned with public values.
△ Less
Submitted 2 October, 2021; v1 submitted 4 July, 2020;
originally announced July 2020.
-
Asymptotically Optimal Exact Minibatch Metropolis-Hastings
Authors:
Ruqi Zhang,
A. Feder Cooper,
Christopher De Sa
Abstract:
Metropolis-Hastings (MH) is a commonly-used MCMC algorithm, but it can be intractable on large datasets due to requiring computations over the whole dataset. In this paper, we study minibatch MH methods, which instead use subsamples to enable scaling. We observe that most existing minibatch MH methods are inexact (i.e. they may change the target distribution), and show that this inexactness can ca…
▽ More
Metropolis-Hastings (MH) is a commonly-used MCMC algorithm, but it can be intractable on large datasets due to requiring computations over the whole dataset. In this paper, we study minibatch MH methods, which instead use subsamples to enable scaling. We observe that most existing minibatch MH methods are inexact (i.e. they may change the target distribution), and show that this inexactness can cause arbitrarily large errors in inference. We propose a new exact minibatch MH method, TunaMH, which exposes a tunable trade-off between its batch size and its theoretically guaranteed convergence rate. We prove a lower bound on the batch size that any minibatch MH method must use to retain exactness while guaranteeing fast convergence-the first such bound for minibatch MH-and show TunaMH is asymptotically optimal in terms of the batch size. Empirically, we show TunaMH outperforms other exact minibatch MH methods on robust linear regression, truncated Gaussian mixtures, and logistic regression.
△ Less
Submitted 22 October, 2020; v1 submitted 20 June, 2020;
originally announced June 2020.
-
AMAGOLD: Amortized Metropolis Adjustment for Efficient Stochastic Gradient MCMC
Authors:
Ruqi Zhang,
A. Feder Cooper,
Christopher De Sa
Abstract:
Stochastic gradient Hamiltonian Monte Carlo (SGHMC) is an efficient method for sampling from continuous distributions. It is a faster alternative to HMC: instead of using the whole dataset at each iteration, SGHMC uses only a subsample. This improves performance, but introduces bias that can cause SGHMC to converge to the wrong distribution. One can prevent this using a step size that decays to ze…
▽ More
Stochastic gradient Hamiltonian Monte Carlo (SGHMC) is an efficient method for sampling from continuous distributions. It is a faster alternative to HMC: instead of using the whole dataset at each iteration, SGHMC uses only a subsample. This improves performance, but introduces bias that can cause SGHMC to converge to the wrong distribution. One can prevent this using a step size that decays to zero, but such a step size schedule can drastically slow down convergence. To address this tension, we propose a novel second-order SG-MCMC algorithm---AMAGOLD---that infrequently uses Metropolis-Hastings (M-H) corrections to remove bias. The infrequency of corrections amortizes their cost. We prove AMAGOLD converges to the target distribution with a fixed, rather than a diminishing, step size, and that its convergence rate is at most a constant factor slower than a full-batch baseline. We empirically demonstrate AMAGOLD's effectiveness on synthetic distributions, Bayesian logistic regression, and Bayesian neural networks.
△ Less
Submitted 29 February, 2020;
originally announced March 2020.