Skip to main content

Showing 1–28 of 28 results for author: Dooley, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19314  [pdf, other

    cs.CL cs.AI cs.LG

    LiveBench: A Challenging, Contamination-Free LLM Benchmark

    Authors: Colin White, Samuel Dooley, Manley Roberts, Arka Pal, Ben Feuer, Siddhartha Jain, Ravid Shwartz-Ziv, Neel Jain, Khalid Saifullah, Siddartha Naidu, Chinmay Hegde, Yann LeCun, Tom Goldstein, Willie Neiswanger, Micah Goldblum

    Abstract: Test set contamination, wherein test data from a benchmark ends up in a newer model's training set, is a well-documented obstacle for fair LLM evaluation and can quickly render benchmarks obsolete. To mitigate this, many recent benchmarks crowdsource new prompts and evaluations from human or LLM judges; however, these can introduce significant biases, and break down when scoring hard questions. In… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.08391  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Large Language Models Must Be Taught to Know What They Don't Know

    Authors: Sanyam Kapoor, Nate Gruver, Manley Roberts, Katherine Collins, Arka Pal, Umang Bhatt, Adrian Weller, Samuel Dooley, Micah Goldblum, Andrew Gordon Wilson

    Abstract: When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibrati… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Code available at: https://github.com/activatedgeek/calibration-tuning

  3. arXiv:2402.18213  [pdf, other

    cs.LG cs.CV stat.ML

    Multi-objective Differentiable Neural Architecture Search

    Authors: Rhea Sanjay Sukthanker, Arber Zela, Benedikt Staffler, Samuel Dooley, Josif Grabocka, Frank Hutter

    Abstract: Pareto front profiling in multi-objective optimization (MOO), i.e. finding a diverse set of Pareto optimal solutions, is challenging, especially with expensive objectives like neural network training. Typically, in MOO neural architecture search (NAS), we aim to balance performance and hardware metrics across devices. Prior NAS approaches simplify this task by incorporating hardware constraints in… ▽ More

    Submitted 19 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 37 pages, 27 figures

  4. arXiv:2402.13228  [pdf, other

    cs.CL cs.AI cs.LG

    Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive

    Authors: Arka Pal, Deep Karkhanis, Samuel Dooley, Manley Roberts, Siddartha Naidu, Colin White

    Abstract: Direct Preference Optimisation (DPO) is effective at significantly improving the performance of large language models (LLMs) on downstream tasks such as reasoning, summarisation, and alignment. Using pairs of preferred and dispreferred data, DPO models the relative probability of picking one response over another. In this work, first we show theoretically that the standard DPO loss can lead to a r… ▽ More

    Submitted 3 July, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  5. arXiv:2311.01933  [pdf, other

    cs.LG

    ForecastPFN: Synthetically-Trained Zero-Shot Forecasting

    Authors: Samuel Dooley, Gurnoor Singh Khurana, Chirag Mohapatra, Siddartha Naidu, Colin White

    Abstract: The vast majority of time-series forecasting approaches require a substantial training dataset. However, many real-life forecasting applications have very little initial observations, sometimes just 40 or fewer. Thus, the applicability of most forecasting methods is restricted in data-sparse commercial applications. While there is recent work in the setting of very limited initial data (so-called… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Journal ref: Thirty-seventh Conference on Neural Information Processing Systems, 2023

  6. arXiv:2310.12145  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    Fairer and More Accurate Tabular Models Through NAS

    Authors: Richeek Das, Samuel Dooley

    Abstract: Making models algorithmically fairer in tabular data has been long studied, with techniques typically oriented towards fixes which usually take a neural model with an undesirable outcome and make changes to how the data are ingested, what the model weights are, or how outputs are processed. We employ an emergent and different strategy where we consider updating the model's architecture and trainin… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  7. arXiv:2310.10628  [pdf, other

    cs.CL

    Data Contamination Through the Lens of Time

    Authors: Manley Roberts, Himanshu Thakur, Christine Herlihy, Colin White, Samuel Dooley

    Abstract: Recent claims about the impressive abilities of large language models (LLMs) are often supported by evaluating publicly available benchmarks. Since LLMs train on wide swaths of the internet, this practice raises concerns of data contamination, i.e., evaluating on examples that are explicitly or implicitly included in the training data. Data contamination remains notoriously challenging to measure… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  8. arXiv:2308.10882  [pdf, other

    cs.AI cs.CL

    Giraffe: Adventures in Expanding Context Lengths in LLMs

    Authors: Arka Pal, Deep Karkhanis, Manley Roberts, Samuel Dooley, Arvind Sundararajan, Siddartha Naidu

    Abstract: Modern large language models (LLMs) that rely on attention mechanisms are typically trained with fixed context lengths which enforce upper limits on the length of input sequences that they can handle at evaluation time. To use these models on sequences longer than the train-time context length, one might employ techniques from the growing family of context length extrapolation methods -- most of w… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  9. arXiv:2211.15937  [pdf, other

    cs.CY cs.AI cs.CV cs.LG

    Robustness Disparities in Face Detection

    Authors: Samuel Dooley, George Z. Wei, Tom Goldstein, John P. Dickerson

    Abstract: Facial analysis systems have been deployed by large companies and critiqued by scholars and activists for the past decade. Many existing algorithmic audits examine the performance of these systems on later stage elements of facial analysis systems like facial recognition and age, emotion, or perceived gender prediction; however, a core component to these systems has been vastly understudied from a… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: NeurIPS Datasets & Benchmarks Track 2022

  10. arXiv:2211.03554  [pdf, other

    cs.AI cs.HC

    How Technology Impacts and Compares to Humans in Socially Consequential Arenas

    Authors: Samuel Dooley

    Abstract: One of the main promises of technology development is for it to be adopted by people, organizations, societies, and governments -- incorporated into their life, work stream, or processes. Often, this is socially beneficial as it automates mundane tasks, frees up more time for other more important things, or otherwise improves the lives of those who use the technology. However, these beneficial res… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: Doctoral thesis proposal. arXiv admin note: substantial text overlap with arXiv:2110.08396, arXiv:2108.12508, arXiv:2006.12621

  11. arXiv:2210.09943  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face Recognition

    Authors: Samuel Dooley, Rhea Sanjay Sukthanker, John P. Dickerson, Colin White, Frank Hutter, Micah Goldblum

    Abstract: Face recognition systems are widely deployed in safety-critical applications, including law enforcement, yet they exhibit bias across a range of socio-demographic dimensions, such as gender and race. Conventional wisdom dictates that model biases arise from biased training data. As a consequence, previous works on bias mitigation largely focused on pre-processing the training data, adding penaltie… ▽ More

    Submitted 6 December, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

  12. arXiv:2203.08235  [pdf, other

    cs.CV cs.LG

    A Deep Dive into Dataset Imbalance and Bias in Face Identification

    Authors: Valeriia Cherepanova, Steven Reich, Samuel Dooley, Hossein Souri, Micah Goldblum, Tom Goldstein

    Abstract: As the deployment of automated face recognition (FR) systems proliferates, bias in these systems is not just an academic question, but a matter of public concern. Media portrayals often center imbalance as the main source of bias, i.e., that FR models perform worse on images of non-white people or women because these demographic groups are underrepresented in training data. Recent academic researc… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

  13. arXiv:2203.00565  [pdf, other

    cs.CL math.AT

    Topological Data Analysis for Word Sense Disambiguation

    Authors: Michael Rawson, Samuel Dooley, Mithun Bharadwaj, Rishabh Choudhary

    Abstract: We develop and test a novel unsupervised algorithm for word sense induction and disambiguation which uses topological data analysis. Typical approaches to the problem involve clustering, based on simple low level features of distance in word embeddings. Our approach relies on advanced mathematical concepts in the field of topology which provides a richer conceptualization of clusters for the word… ▽ More

    Submitted 1 March, 2022; originally announced March 2022.

  14. arXiv:2202.11095  [pdf, other

    cs.GT cs.AI cs.DS

    The Dichotomous Affiliate Stable Matching Problem: Approval-Based Matching with Applicant-Employer Relations

    Authors: Marina Knittel, Samuel Dooley, John P. Dickerson

    Abstract: While the stable marriage problem and its variants model a vast range of matching markets, they fail to capture complex agent relationships, such as the affiliation of applicants and employers in an interview marketplace. To model this problem, the existing literature on matching with externalities permits agents to provide complete and total rankings over matchings based off of both their own and… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

    Comments: 19 pages, 2 figures

  15. arXiv:2202.10194  [pdf

    physics.chem-ph cs.DC cs.LG physics.comp-ph stat.ML

    Low-Dimensional High-Fidelity Kinetic Models for NOX Formation by a Compute Intensification Method

    Authors: Mark Kelly, Harry Dunne, Gilles Bourque, Stephen Dooley

    Abstract: A novel compute intensification methodology to the construction of low-dimensional, high-fidelity "compact" kinetic models for NOX formation is designed and demonstrated. The method adapts the data intensive Machine Learned Optimization of Chemical Kinetics (MLOCK) algorithm for compact model generation by the use of a Latin Square method for virtual reaction network generation. A set of logical r… ▽ More

    Submitted 21 February, 2022; originally announced February 2022.

    Comments: arXiv admin note: text overlap with arXiv:2202.08021

  16. arXiv:2202.08021  [pdf

    physics.chem-ph cs.DC cs.LG physics.comp-ph stat.ML

    Toward Development of Machine Learned Techniques for Production of Compact Kinetic Models

    Authors: Mark Kelly, Mark Fortune, Gilles Bourque, Stephen Dooley

    Abstract: Chemical kinetic models are an essential component in the development and optimisation of combustion devices through their coupling to multi-dimensional simulations such as computational fluid dynamics (CFD). Low-dimensional kinetic models which retain good fidelity to the reality are needed, the production of which requires considerable human-time cost and expert knowledge. Here, we present a nov… ▽ More

    Submitted 16 February, 2022; originally announced February 2022.

  17. arXiv:2201.10047   

    cs.CV cs.AI cs.CY cs.LG

    Are Commercial Face Detection Models as Biased as Academic Models?

    Authors: Samuel Dooley, George Z. Wei, Tom Goldstein, John P. Dickerson

    Abstract: As facial recognition systems are deployed more widely, scholars and activists have studied their biases and harms. Audits are commonly used to accomplish this and compare the algorithmic facial recognition systems' performance against datasets with various metadata labels about the subjects of the images. Seminal works have found discrepancies in performance by gender expression, age, perceived r… ▽ More

    Submitted 29 November, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

    Comments: This preprint and arXiv:2108.12508 were combined and a more rigorous analysis added to result in the NeurIPS Datasets & Benchmark 2022 paper arXiv:2211.15937

  18. arXiv:2110.09437  [pdf, other

    cs.CY cs.CR cs.HC

    Ctrl-Shift: How Privacy Sentiment Changed from 2019 to 2021

    Authors: Angelica Goetzen, Samuel Dooley, Elissa M. Redmiles

    Abstract: People's privacy sentiments influence changes in legislation as well as technology design and use. While single-point-in-time investigations of privacy sentiment offer useful insight, study of people's privacy sentiments over time is also necessary to better understand and anticipate evolving privacy attitudes. In this work, we use repeated cross-sectional surveys (n=6,676) to model the sentiments… ▽ More

    Submitted 15 March, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

  19. arXiv:2110.08396  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Comparing Human and Machine Bias in Face Recognition

    Authors: Samuel Dooley, Ryan Downing, George Wei, Nathan Shankar, Bradon Thymes, Gudrun Thorkelsdottir, Tiye Kurtz-Miott, Rachel Mattson, Olufemi Obiwumi, Valeriia Cherepanova, Micah Goldblum, John P Dickerson, Tom Goldstein

    Abstract: Much recent research has uncovered and discussed serious concerns of bias in facial analysis technologies, finding performance disparities between groups of people based on perceived gender, skin type, lighting condition, etc. These audits are immensely important and successful at measuring algorithmic bias but have two major challenges: the audits (1) use facial recognition datasets which lack qu… ▽ More

    Submitted 25 October, 2021; v1 submitted 15 October, 2021; originally announced October 2021.

  20. arXiv:2108.12508  [pdf, other

    cs.CY cs.AI cs.CV cs.LG

    Robustness Disparities in Commercial Face Detection

    Authors: Samuel Dooley, Tom Goldstein, John P. Dickerson

    Abstract: Facial detection and analysis systems have been deployed by large companies and critiqued by scholars and activists for the past decade. Critiques that focus on system performance analyze disparity of the system's output, i.e., how frequently is a face detected for different Fitzpatrick skin types or perceived genders. However, we focus on the robustness of these system outputs under noisy natural… ▽ More

    Submitted 27 August, 2021; originally announced August 2021.

  21. arXiv:2106.03215  [pdf, other

    cs.GT cs.AI cs.LG cs.MA

    PreferenceNet: Encoding Human Preferences in Auction Design with Deep Learning

    Authors: Neehar Peri, Michael J. Curry, Samuel Dooley, John P. Dickerson

    Abstract: The design of optimal auctions is a problem of interest in economics, game theory and computer science. Despite decades of effort, strategyproof, revenue-maximizing auction designs are still not known outside of restricted settings. However, recent methods using deep learning have shown some success in approximating optimal auctions, recovering several known solutions and outperforming strong base… ▽ More

    Submitted 17 October, 2021; v1 submitted 6 June, 2021; originally announced June 2021.

    Comments: This work has been accepted to Neural Information Processing Systems (NeurIPS) 2021. First two authors contributed equally

  22. arXiv:2010.06398  [pdf, other

    cs.GT cs.LG

    ProportionNet: Balancing Fairness and Revenue for Auction Design with Deep Learning

    Authors: Kevin Kuo, Anthony Ostuni, Elizabeth Horishny, Michael J. Curry, Samuel Dooley, **-yeh Chiang, Tom Goldstein, John P. Dickerson

    Abstract: The design of revenue-maximizing auctions with strong incentive guarantees is a core concern of economic theory. Computational auctions enable online advertising, sourcing, spectrum allocation, and myriad financial markets. Analytic progress in this space is notoriously difficult; since Myerson's 1981 work characterizing single-item optimal auctions, there has been limited progress outside of rest… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

  23. arXiv:2009.11867  [pdf, other

    econ.GN cs.AI cs.CY cs.DS cs.GT

    The Affiliate Matching Problem: On Labor Markets where Firms are Also Interested in the Placement of Previous Workers

    Authors: Samuel Dooley, John P. Dickerson

    Abstract: In many labor markets, workers and firms are connected via affiliative relationships. A management consulting firm wishes to both accept the best new workers but also place its current affiliated workers at strong firms. Similarly, a research university wishes to hire strong job market candidates while also placing its own candidates at strong peer universities. We model this affiliate matching pr… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

  24. arXiv:2006.12621  [pdf, other

    cs.LG cs.CY

    Fairness Through Robustness: Investigating Robustness Disparity in Deep Learning

    Authors: Vedant Nanda, Samuel Dooley, Sahil Singla, Soheil Feizi, John P. Dickerson

    Abstract: Deep neural networks (DNNs) are increasingly used in real-world applications (e.g. facial recognition). This has resulted in concerns about the fairness of decisions made by these models. Various notions and measures of fairness have been proposed to ensure that a decision-making system does not disproportionately harm (or benefit) particular subgroups of the population. In this paper, we argue th… ▽ More

    Submitted 21 January, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: Accepted at ACM Conference on Fairness, Accountability, and Transparency (FAccT) 2021

  25. arXiv:2001.09742  [pdf, ps, other

    cs.CY

    Can an Algorithm be My Healthcare Proxy?

    Authors: Duncan C McElfresh, Samuel Dooley, Yuan Cui, Kendra Griesman, Weiqin Wang, Tyler Will, Neil Sehgal, John P Dickerson

    Abstract: Planning for death is not a process in which everyone participates. Yet a lack of planning can have vast impacts on a patient's well-being, the well-being of her family, and the medical community as a whole. Advance Care Planning (ACP) has been a field in the United States for a half-century. Many modern techniques prompting patients to think about end of life (EOL) involve short surveys or questi… ▽ More

    Submitted 7 January, 2020; originally announced January 2020.

    Comments: Accepted for a poster presentation at the 4th International Workshop on Health Intelligence (W3PHIAI-20), colocated with AAAI 2020

  26. arXiv:1808.02443  [pdf, other

    cs.CV

    Overhead Detection: Beyond 8-bits and RGB

    Authors: Eliza Mace, Keith Manville, Monica Barbu-McInnis, Michael Laielli, Matthew Klaric, Samuel Dooley

    Abstract: This study uses the challenging and publicly available SpaceNet dataset to establish a performance baseline for a state-of-the-art object detector in satellite imagery. Specifically, we examine how various features of the data affect building detection accuracy with respect to the Intersection over Union metric. We demonstrate that the performance of the R-FCN detection algorithm on imagery with a… ▽ More

    Submitted 7 August, 2018; originally announced August 2018.

    Comments: 10 pages, 8 figures, 2 tables

    Journal ref: Naval Applications of Machine Learning, February 13, 2018

  27. arXiv:1802.07856  [pdf, other

    cs.CV

    xView: Objects in Context in Overhead Imagery

    Authors: Darius Lam, Richard Kuzma, Kevin McGee, Samuel Dooley, Michael Laielli, Matthew Klaric, Yaroslav Bulatov, Brendan McCord

    Abstract: We introduce a new large-scale dataset for the advancement of object detection techniques and overhead object detection research. This satellite imagery dataset enables research progress pertaining to four key computer vision frontiers. We utilize a novel process for geospatial category detection and bounding box annotation with three stages of quality control. Our data is collected from WorldView… ▽ More

    Submitted 21 February, 2018; originally announced February 2018.

    Comments: Initial submission

  28. arXiv:1204.4459  [pdf

    cs.NI

    An Interference-Aware Virtual Clustering Paradigm for Resource Management in Cognitive Femtocell Networks

    Authors: Faisal Tariq, Laurence S. Dooley, Adrian S. Poulton

    Abstract: Femtocells represent a promising alternative solution for high quality wireless access in indoor scenarios where conventional cellular system coverage can be poor. Femtocell access points (FAP) are normally randomly deployed by the end user, so only post deployment network planning is possible. Furthermore, this uncoordinated deployment creates the potential for severe interference to co-located f… ▽ More

    Submitted 19 April, 2012; originally announced April 2012.