Skip to main content

Showing 1–50 of 59 results for author: Koyejo, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16797  [pdf, other

    cs.CL cs.AI

    Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs

    Authors: Ashwinee Panda, Berivan Isik, Xiangyu Qi, Sanmi Koyejo, Tsachy Weissman, Prateek Mittal

    Abstract: Existing methods for adapting large language models (LLMs) to new tasks are not suited to multi-task adaptation because they modify all the model weights -- causing destructive interference between tasks. The resulting effects, such as catastrophic forgetting of earlier tasks, make it challenging to obtain good performance on multiple tasks at the same time. To mitigate this, we propose Lottery Ti… ▽ More

    Submitted 25 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2406.12785  [pdf, other

    cs.LG

    In-Context Learning of Energy Functions

    Authors: Rylan Schaeffer, Mikail Khona, Sanmi Koyejo

    Abstract: In-context learning is a powerful capability of certain machine learning models that arguably underpins the success of today's frontier AI models. However, in-context learning is critically limited to settings where the in-context distribution of interest $p_θ^{ICL}( x|\mathcal{D})$ can be straightforwardly expressed and/or parameterized by the model; for instance, language modeling relies on expr… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Proceedings of the 1st Workshop on In-Context Learning at the 41st International Conference on Machine Learning, Vienna, Austria. 2024. arXiv admin note: text overlap with arXiv:2402.10202

  3. arXiv:2406.10229  [pdf, other

    cs.LG cs.AI

    Quantifying Variance in Evaluation Benchmarks

    Authors: Lovish Madaan, Aaditya K. Singh, Rylan Schaeffer, Andrew Poulton, Sanmi Koyejo, Pontus Stenetorp, Sharan Narang, Dieuwke Hupkes

    Abstract: Evaluation benchmarks are the cornerstone of measuring capabilities of large language models (LLMs), as well as driving progress in said capabilities. Originally designed to make claims about capabilities (or lack thereof) in fully pretrained models, evaluation benchmarks are now also extensively used to decide between various training choices. Despite this widespread usage, we rarely quantify the… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  4. arXiv:2406.09561  [pdf, other

    cs.LG cs.AI

    Label Noise Robustness for Domain-Agnostic Fair Corrections via Nearest Neighbors Label Spreading

    Authors: Nathan Stromberg, Rohan Ayyagari, Sanmi Koyejo, Richard Nock, Lalitha Sankar

    Abstract: Last-layer retraining methods have emerged as an efficient framework for correcting existing base models. Within this framework, several methods have been proposed to deal with correcting models for subgroup fairness with and without group membership information. Importantly, prior work has demonstrated that many methods are susceptible to noisy labels. To this end, we propose a drop-in correction… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  5. arXiv:2406.09366  [pdf, other

    cs.LG cs.CV q-bio.NC

    Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

    Authors: Rylan Schaeffer, Victor Lecomte, Dhruv Bhandarkar Pai, Andres Carranza, Berivan Isik, Alyssa Unell, Mikail Khona, Thomas Yerxa, Yann LeCun, SueYeon Chung, Andrey Gromov, Ravid Shwartz-Ziv, Sanmi Koyejo

    Abstract: Maximum Manifold Capacity Representations (MMCR) is a recent multi-view self-supervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is intriguing because it does not fit neatly into any of the commonplace MVSSL lineages, instead originating from a statistical mechanical perspective on the linear separability of data manifolds. In this paper, we seek to impro… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  6. arXiv:2406.04391  [pdf, other

    cs.LG cs.AI cs.CL

    Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

    Authors: Rylan Schaeffer, Hailey Schoelkopf, Brando Miranda, Gabriel Mukobi, Varun Madan, Adam Ibrahim, Herbie Bradley, Stella Biderman, Sanmi Koyejo

    Abstract: Predictable behavior from scaling advanced AI systems is an extremely desirable property. Although a well-established literature exists on how pretraining performance scales, the literature on how particular downstream capabilities scale is significantly muddier. In this work, we take a step back and ask: why has predicting specific downstream capabilities with scale remained elusive? While many f… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  7. arXiv:2406.01013  [pdf, other

    cs.LG cs.CL

    Scalable Ensembling For Mitigating Reward Overoptimisation

    Authors: Ahmed M. Ahmed, Rafael Rafailov, Stepan Sharkov, Xuechen Li, Sanmi Koyejo

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has enabled significant advancements within language modeling for powerful, instruction-following models. However, the alignment of these models remains a pressing challenge as the policy tends to overfit the learned ``proxy" reward model past an inflection point of utility as measured by a ``gold" reward model that is more performant -- a phenomen… ▽ More

    Submitted 18 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  8. arXiv:2405.17512  [pdf, other

    cs.LG cs.AI cs.CY

    On Fairness of Low-Rank Adaptation of Large Models

    Authors: Zhoujie Ding, Ken Ziyu Liu, Pura Peetathawatchai, Berivan Isik, Sanmi Koyejo

    Abstract: Low-rank adaptation of large models, particularly LoRA, has gained traction due to its computational efficiency. This efficiency, contrasted with the prohibitive costs of full-model fine-tuning, means that practitioners often turn to LoRA and sometimes without a complete understanding of its ramifications. In this study, we focus on fairness and ask whether LoRA has an unexamined impact on utility… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  9. arXiv:2404.16277  [pdf, ps, other

    cs.LG stat.ML

    Causally Inspired Regularization Enables Domain General Representations

    Authors: Olawale Salaudeen, Sanmi Koyejo

    Abstract: Given a causal graph representing the data-generating process shared across different domains/distributions, enforcing sufficient graph-implied conditional independencies can identify domain-general (non-spurious) feature representations. For the standard input-output predictive setting, we categorize the set of graphs considered in the literature into two distinct groups: (i) those in which the e… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  10. arXiv:2404.01413  [pdf, other

    cs.LG cs.AI cs.CL cs.ET stat.ML

    Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

    Authors: Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo

    Abstract: The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration… ▽ More

    Submitted 29 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  11. arXiv:2403.07442  [pdf, other

    cs.LG stat.ML

    Proxy Methods for Domain Adaptation

    Authors: Katherine Tsai, Stephen R. Pfohl, Olawale Salaudeen, Nicole Chiou, Matt J. Kusner, Alexander D'Amour, Sanmi Koyejo, Arthur Gretton

    Abstract: We study the problem of domain adaptation under distribution shift, where the shift is due to a change in the distribution of an unobserved, latent variable that confounds both the covariates and the labels. In this setting, neither the covariate shift nor the label shift assumptions apply. Our approach to adaptation employs proximal causal learning, a technique for estimating causal effects in se… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  12. arXiv:2403.03357  [pdf, other

    cs.AI cs.CY

    The Case for Globalizing Fairness: A Mixed Methods Study on Colonialism, AI, and Health in Africa

    Authors: Mercy Asiedu, Awa Dieng, Iskandar Haykel, Negar Rostamzadeh, Stephen Pfohl, Chirag Nagpal, Maria Nagawa, Abigail Oppong, Sanmi Koyejo, Katherine Heller

    Abstract: With growing application of machine learning (ML) technologies in healthcare, there have been calls for develo** techniques to understand and mitigate biases these systems may exhibit. Fair-ness considerations in the development of ML-based solutions for health have particular implications for Africa, which already faces inequitable power imbalances between the Global North and South.This paper… ▽ More

    Submitted 11 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 11 pages, 4 figures. arXiv admin note: text overlap with arXiv:2304.02190

  13. arXiv:2403.02715  [pdf, other

    cs.CL cs.AI

    Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models

    Authors: Sang T. Truong, Duc Q. Nguyen, Toan Nguyen, Dong D. Le, Nhi N. Truong, Tho Quan, Sanmi Koyejo

    Abstract: Recent advancements in large language models (LLMs) have underscored their importance in the evolution of artificial intelligence. However, despite extensive pretraining on multilingual datasets, available open-sourced LLMs exhibit limited effectiveness in processing Vietnamese. The challenge is exacerbated by the absence of systematic benchmark datasets and metrics tailored for Vietnamese LLM eva… ▽ More

    Submitted 26 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 51 pages

    MSC Class: 68T50

  14. arXiv:2402.11039  [pdf, other

    cs.LG stat.ML

    Robustness to Subpopulation Shift with Domain Label Noise via Regularized Annotation of Domains

    Authors: Nathan Stromberg, Rohan Ayyagari, Monica Welfert, Sanmi Koyejo, Richard Nock, Lalitha Sankar

    Abstract: Existing methods for last layer retraining that aim to optimize worst-group accuracy (WGA) rely heavily on well-annotated groups in the training data. We show, both in theory and practice, that annotation-based data augmentations using either downsampling or upweighting for WGA are susceptible to domain annotation noise, and in high-noise regimes approach the WGA of a model trained with vanilla em… ▽ More

    Submitted 26 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: Generalized Gaussian assumption

  15. arXiv:2402.10202  [pdf, other

    cs.LG

    Bridging Associative Memory and Probabilistic Modeling

    Authors: Rylan Schaeffer, Nika Zahedi, Mikail Khona, Dhruv Pai, Sang Truong, Yilun Du, Mitchell Ostrow, Sarthak Chandra, Andres Carranza, Ila Rani Fiete, Andrey Gromov, Sanmi Koyejo

    Abstract: Associative memory and probabilistic modeling are two fundamental topics in artificial intelligence. The first studies recurrent neural networks designed to denoise, complete and retrieve data, whereas the second studies learning and sampling from probability distributions. Based on the observation that associative memory's energy functions can be seen as probabilistic modeling's negative log like… ▽ More

    Submitted 13 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  16. arXiv:2402.08787  [pdf, other

    cs.LG cs.CL

    Rethinking Machine Unlearning for Large Language Models

    Authors: Sijia Liu, Yuanshun Yao, **ghan Jia, Stephen Casper, Nathalie Baracaldo, Peter Hase, Yuguang Yao, Chris Yuhao Liu, Xiaojun Xu, Hang Li, Kush R. Varshney, Mohit Bansal, Sanmi Koyejo, Yang Liu

    Abstract: We explore machine unlearning (MU) in the domain of large language models (LLMs), referred to as LLM unlearning. This initiative aims to eliminate undesirable data influence (e.g., sensitive or illegal information) and the associated model capabilities, while maintaining the integrity of essential knowledge generation and not affecting causally unrelated information. We envision LLM unlearning bec… ▽ More

    Submitted 1 July, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  17. arXiv:2402.04177  [pdf, other

    cs.CL cs.LG stat.ML

    Scaling Laws for Downstream Task Performance of Large Language Models

    Authors: Berivan Isik, Natalia Ponomareva, Hussein Hazimeh, Dimitris Paparas, Sergei Vassilvitskii, Sanmi Koyejo

    Abstract: Scaling laws provide important insights that can guide the design of large language models (LLMs). Existing work has primarily focused on studying scaling laws for pretraining (upstream) loss. However, in transfer learning settings, in which LLMs are pretrained on an unsupervised dataset and then finetuned on a downstream task, we often also care about the downstream performance. In this work, we… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  18. arXiv:2402.00742  [pdf, other

    cs.CL cs.AI

    Transforming and Combining Rewards for Aligning Large Language Models

    Authors: Zihao Wang, Chirag Nagpal, Jonathan Berant, Jacob Eisenstein, Alex D'Amour, Sanmi Koyejo, Victor Veitch

    Abstract: A common approach for aligning language models to human preferences is to first learn a reward model from preference data, and then use this reward model to update the language model. We study two closely related problems that arise in this approach. First, any monotone transformation of the reward model preserves preference ranking; is there a choice that is ``better'' than others? Second, we oft… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    MSC Class: 68T50 ACM Class: I.2

  19. arXiv:2401.06059  [pdf, other

    cs.CL cs.AI cs.LG

    Investigating Data Contamination for Pre-training Language Models

    Authors: Minhao Jiang, Ken Ziyu Liu, Ming Zhong, Rylan Schaeffer, Siru Ouyang, Jiawei Han, Sanmi Koyejo

    Abstract: Language models pre-trained on web-scale corpora demonstrate impressive capabilities on diverse downstream tasks. However, there is increasing concern whether such capabilities might arise from evaluation datasets being included in the pre-training corpus -- a phenomenon known as \textit{data contamination} -- in a manner that artificially increases performance. There has been little understanding… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 16 pages, 5 figures

  20. arXiv:2312.03096  [pdf, other

    cs.LG cs.AI cs.NE

    What Causes Polysemanticity? An Alternative Origin Story of Mixed Selectivity from Incidental Causes

    Authors: Victor Lecomte, Kushal Thaman, Rylan Schaeffer, Naomi Bashkansky, Trevor Chow, Sanmi Koyejo

    Abstract: Polysemantic neurons -- neurons that activate for a set of unrelated features -- have been seen as a significant obstacle towards interpretability of task-optimized deep networks, with implications for AI safety. The classic origin story of polysemanticity is that the data contains more ``features" than neurons, such that learning to perform a task forces the network to co-allocate multiple unrela… ▽ More

    Submitted 13 February, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  21. arXiv:2311.02316  [pdf, other

    cs.LG cs.NE

    Self-Supervised Learning of Representations for Space Generates Multi-Modular Grid Cells

    Authors: Rylan Schaeffer, Mikail Khona, Tzuhsuan Ma, Cristóbal Eyzaguirre, Sanmi Koyejo, Ila Rani Fiete

    Abstract: To solve the spatial problems of map**, localization and navigation, the mammalian lineage has developed striking spatial representations. One important spatial representation is the Nobel-prize winning grid cells: neurons that represent self-location, a local and aperiodic quantity, with seemingly bizarre non-local and spatially periodic activity patterns of a few discrete periods. Why has the… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  22. arXiv:2310.13807  [pdf, other

    cs.LG

    Learning to (Learn at Test Time)

    Authors: Yu Sun, Xinhao Li, Karan Dalal, Chloe Hsu, Sanmi Koyejo, Carlos Guestrin, Xiaolong Wang, Tatsunori Hashimoto, Xinlei Chen

    Abstract: We reformulate the problem of supervised learning as learning to learn with two nested loops (i.e. learning problems). The inner loop learns on each individual instance with self-supervision before final prediction. The outer loop learns the self-supervised task used by the inner loop, such that its final prediction improves. Our inner loop turns out to be equivalent to linear attention when the i… ▽ More

    Submitted 7 January, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: Fixed a few small typos

  23. arXiv:2310.01405  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.CY

    Representation Engineering: A Top-Down Approach to AI Transparency

    Authors: Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, Dan Hendrycks

    Abstract: In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience. RepE places population-level representations, rather than neurons or circuits, at the center of analysis, equip** us with novel methods for monitoring and manipulating high-level cognitive p… ▽ More

    Submitted 10 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Code is available at https://github.com/andyzoujm/representation-engineering

  24. arXiv:2307.10573  [pdf, other

    cs.AI

    Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting

    Authors: Rylan Schaeffer, Kateryna Pistunova, Samar Khanna, Sarthak Consul, Sanmi Koyejo

    Abstract: Language models can be prompted to reason through problems in a manner that significantly improves performance. However, \textit{why} such prompting improves performance is unclear. Recent work showed that using logically \textit{invalid} Chain-of-Thought (CoT) prompting improves performance almost as much as logically \textit{valid} CoT prompting, and that editing CoT prompts to replace problem-s… ▽ More

    Submitted 22 July, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: ICML 2023 Workshop: Knowledge and Logical Reasoning in the Era of Data-driven Learning

  25. arXiv:2307.10569  [pdf, ps, other

    cs.LG cs.AI

    Deceptive Alignment Monitoring

    Authors: Andres Carranza, Dhruv Pai, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo

    Abstract: As the capabilities of large machine learning models continue to grow, and as the autonomy afforded to such models continues to expand, the spectre of a new adversary looms: the models themselves. The threat that a model might behave in a seemingly reasonable manner, while secretly and subtly modifying its behavior for ulterior reasons is often referred to as deceptive alignment in the AI Safety &… ▽ More

    Submitted 25 July, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: Accepted as BlueSky Oral to 2023 ICML AdvML Workshop

  26. arXiv:2307.10563  [pdf, other

    cs.LG cs.AI

    FACADE: A Framework for Adversarial Circuit Anomaly Detection and Evaluation

    Authors: Dhruv Pai, Andres Carranza, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo

    Abstract: We present FACADE, a novel probabilistic and geometric framework designed for unsupervised mechanistic anomaly detection in deep neural networks. Its primary goal is advancing the understanding and mitigation of adversarial attacks. FACADE aims to generate probabilistic distributions over circuits, which provide critical insights to their contribution to changes in the manifold properties of pseud… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: Accepted as BlueSky Poster at 2023 ICML AdvML Workshop

  27. arXiv:2306.13841  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Is Pre-training Truly Better Than Meta-Learning?

    Authors: Brando Miranda, Patrick Yu, Saumya Goyal, Yu-Xiong Wang, Sanmi Koyejo

    Abstract: In the context of few-shot learning, it is currently believed that a fixed pre-trained (PT) model, along with fine-tuning the final layer during evaluation, outperforms standard meta-learning algorithms. We re-evaluate these claims under an in-depth empirical examination of an extensive set of formally diverse datasets and compare PT to Model Agnostic Meta-Learning (MAML). Unlike previous work, we… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

    Journal ref: Proceedings of the 40th International Conference on Machine Learning 2023 DMLR Workshop

  28. arXiv:2306.13840  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Beyond Scale: the Diversity Coefficient as a Data Quality Metric Demonstrates LLMs are Pre-trained on Formally Diverse Data

    Authors: Alycia Lee, Brando Miranda, Sudharsan Sundar, Sanmi Koyejo

    Abstract: Current trends to pre-train capable Large Language Models (LLMs) mostly focus on scaling of model and dataset size. However, the quality of pre-training data is an important factor for training powerful LLMs, yet it is a nebulous concept that has not been fully characterized. Therefore, we use the recently proposed Task2Vec diversity coefficient to ground and understand formal aspects of data qual… ▽ More

    Submitted 26 September, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

    Journal ref: Proceedings of the 40th International Conference on Machine Learning DMLR 2023

  29. arXiv:2306.12625  [pdf, other

    cs.LG cs.DC stat.ML

    Adaptive Compression in Federated Learning via Side Information

    Authors: Berivan Isik, Francesco Pase, Deniz Gunduz, Sanmi Koyejo, Tsachy Weissman, Michele Zorzi

    Abstract: The high communication cost of sending model updates from the clients to the server is a significant bottleneck for scalable federated learning (FL). Among existing approaches, state-of-the-art bitrate-accuracy tradeoffs have been achieved using stochastic compression methods -- in which the client $n$ sends a sample from a client-only probability distribution $q_{φ^{(n)}}$, and the server estimat… ▽ More

    Submitted 21 April, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

    Comments: Published at the International Conference on Artificial Intelligence and Statistics (AISTATS), 2024

  30. arXiv:2306.11698  [pdf, other

    cs.CL cs.AI cs.CR

    DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

    Authors: Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, Sang T. Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, Bo Li

    Abstract: Generative Pre-trained Transformer (GPT) models have exhibited exciting progress in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the literature on the trustworthiness of GPT models remains limited, practitioners have proposed employing capable GPT models for sensitive applications such as healthcare and finance -- where mistakes can be costly. To thi… ▽ More

    Submitted 26 February, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 Outstanding Paper (Datasets and Benchmarks Track)

  31. arXiv:2306.01799  [pdf, other

    cs.GT cs.IR cs.LG

    Pairwise Ranking Losses of Click-Through Rates Prediction for Welfare Maximization in Ad Auctions

    Authors: Boxiang Lyu, Zhe Feng, Zachary Robertson, Sanmi Koyejo

    Abstract: We study the design of loss functions for click-through rates (CTR) to optimize (social) welfare in advertising auctions. Existing works either only focus on CTR predictions without consideration of business objectives (e.g., welfare) in auctions or assume that the distribution over the participants' expected cost-per-impression (eCPM) is known a priori, then use various additional assumptions on… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 25 pages, 6 figures

  32. arXiv:2305.18766  [pdf, other

    cs.CV cs.AI cs.LG

    HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance

    Authors: Junzhe Zhu, Peiye Zhuang, Sanmi Koyejo

    Abstract: The advancements in automatic text-to-3D generation have been remarkable. Most existing methods use pre-trained text-to-image diffusion models to optimize 3D representations like Neural Radiance Fields (NeRFs) via latent-space denoising score matching. Yet, these methods often result in artifacts and inconsistencies across different views due to their suboptimal optimization approaches and limited… ▽ More

    Submitted 11 March, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Project page: https://hifa-team.github.io/HiFA-site/

  33. arXiv:2304.15004  [pdf, other

    cs.AI cs.LG

    Are Emergent Abilities of Large Language Models a Mirage?

    Authors: Rylan Schaeffer, Brando Miranda, Sanmi Koyejo

    Abstract: Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent abilities intriguing is two-fold: their sharpness, transitioning seemingly instantaneously from not present to present, and their unpredictability, appearing at seemingly unforeseeable model scales. Here, we present an… ▽ More

    Submitted 22 May, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

  34. arXiv:2304.02190  [pdf, other

    cs.LG cs.AI cs.CY

    Globalizing Fairness Attributes in Machine Learning: A Case Study on Health in Africa

    Authors: Mercy Nyamewaa Asiedu, Awa Dieng, Abigail Oppong, Maria Nagawa, Sanmi Koyejo, Katherine Heller

    Abstract: With growing machine learning (ML) applications in healthcare, there have been calls for fairness in ML to understand and mitigate ethical concerns these systems may pose. Fairness has implications for global health in Africa, which already has inequitable power imbalances between the Global North and South. This paper seeks to explore fairness for global health, with Africa as a case study. We pr… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

  35. arXiv:2303.00177  [pdf, ps, other

    cs.LG cs.AI

    Finite-sample Guarantees for Nash Q-learning with Linear Function Approximation

    Authors: Pedro Cisneros-Velarde, Sanmi Koyejo

    Abstract: Nash Q-learning may be considered one of the first and most known algorithms in multi-agent reinforcement learning (MARL) for learning policies that constitute a Nash equilibrium of an underlying general-sum Markov game. Its original proof provided asymptotic guarantees and was for the tabular case. Recently, finite-sample guarantees have been provided using more modern RL techniques for the tabul… ▽ More

    Submitted 28 February, 2023; originally announced March 2023.

    Comments: 25 pages. arXiv admin note: text overlap with arXiv:2205.15891

  36. arXiv:2302.05049  [pdf, other

    cs.LG

    Principled Federated Domain Adaptation: Gradient Projection and Auto-Weighting

    Authors: Enyi Jiang, Yibo Jacky Zhang, Sanmi Koyejo

    Abstract: Federated Domain Adaptation (FDA) describes the federated learning (FL) setting where source clients and a server work collaboratively to improve the performance of a target client where limited data is available. The domain shift between the source and target domains, coupled with limited data of the target client, makes FDA a challenging problem, e.g., common techniques such as federated averagi… ▽ More

    Submitted 24 March, 2024; v1 submitted 9 February, 2023; originally announced February 2023.

    Comments: ICLR 2024

  37. arXiv:2212.11254  [pdf, other

    stat.ML cs.AI cs.LG

    Adapting to Latent Subgroup Shifts via Concepts and Proxies

    Authors: Ibrahim Alabdulmohsin, Nicole Chiou, Alexander D'Amour, Arthur Gretton, Sanmi Koyejo, Matt J. Kusner, Stephen R. Pfohl, Olawale Salaudeen, Jessica Schrouff, Katherine Tsai

    Abstract: We address the problem of unsupervised domain adaptation when the source domain differs from the target domain because of a shift in the distribution of a latent subgroup. When this subgroup confounds all observed data, neither covariate shift nor label shift assumptions apply. We show that the optimal target predictor can be non-parametrically identified with the help of concept and proxy variabl… ▽ More

    Submitted 21 December, 2022; originally announced December 2022.

    Comments: Authors listed in alphabetical order

  38. arXiv:2210.17237  [pdf, other

    stat.ME cs.LG stat.ML

    Latent Multimodal Functional Graphical Model Estimation

    Authors: Katherine Tsai, Boxin Zhao, Sanmi Koyejo, Mladen Kolar

    Abstract: Joint multimodal functional data acquisition, where functional data from multiple modes are measured simultaneously from the same subject, has emerged as an exciting modern approach enabled by recent engineering breakthroughs in the neurological and biological sciences. One prominent motivation to acquire such data is to enable new discoveries of the underlying connectivity by combining multimodal… ▽ More

    Submitted 1 October, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

  39. arXiv:2210.08974  [pdf

    cs.CY

    Coordinated Science Laboratory 70th Anniversary Symposium: The Future of Computing

    Authors: Klara Nahrstedt, Naresh Shanbhag, Vikram Adve, Nancy Amato, Romit Roy Choudhury, Carl Gunter, Nam Sung Kim, Olgica Milenkovic, Sayan Mitra, Lav Varshney, Yurii Vlasov, Sarita Adve, Rashid Bashir, Andreas Cangellaris, James DiCarlo, Katie Driggs-Campbell, Nick Feamster, Mattia Gazzola, Karrie Karahalios, Sanmi Koyejo, Paul Kwiat, Bo Li, Negar Mehr, Ravish Mehra, Andrew Miller , et al. (3 additional authors not shown)

    Abstract: In 2021, the Coordinated Science Laboratory CSL, an Interdisciplinary Research Unit at the University of Illinois Urbana-Champaign, hosted the Future of Computing Symposium to celebrate its 70th anniversary. CSL's research covers the full computing stack, computing's impact on society and the resulting need for social responsibility. In this white paper, we summarize the major technological points… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

  40. arXiv:2210.01834  [pdf, other

    cs.LG cs.CR

    Invariant Aggregator for Defending against Federated Backdoor Attacks

    Authors: Xiaoyang Wang, Dimitrios Dimitriadis, Sanmi Koyejo, Shruti Tople

    Abstract: Federated learning enables training high-utility models across several clients without directly sharing their private data. As a downside, the federated setting makes the model vulnerable to various adversarial attacks in the presence of malicious clients. Despite the theoretical and empirical success in defending against attacks that aim to degrade models' utility, defense against backdoor attack… ▽ More

    Submitted 8 March, 2024; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: AISTATS 2024 camera-ready

  41. arXiv:2209.04030  [pdf, other

    cs.CR cs.AI

    Unraveling the Connections between Privacy and Certified Robustness in Federated Learning Against Poisoning Attacks

    Authors: Chulin Xie, Yunhui Long, Pin-Yu Chen, Qinbin Li, Arash Nourian, Sanmi Koyejo, Bo Li

    Abstract: Federated learning (FL) provides an efficient paradigm to jointly train a global model leveraging data from distributed users. As local training data comes from different users who may not be trustworthy, several studies have shown that FL is vulnerable to poisoning attacks. Meanwhile, to protect the privacy of local users, FL is usually trained in a differentially private way (DPFL). Thus, in thi… ▽ More

    Submitted 30 December, 2023; v1 submitted 8 September, 2022; originally announced September 2022.

    Comments: ACM CCS 2023

  42. arXiv:2208.01545  [pdf, other

    cs.LG

    The Curse of Low Task Diversity: On the Failure of Transfer Learning to Outperform MAML and Their Empirical Equivalence

    Authors: Brando Miranda, Patrick Yu, Yu-Xiong Wang, Sanmi Koyejo

    Abstract: Recently, it has been observed that a transfer learning solution might be all we need to solve many few-shot learning benchmarks -- thus raising important questions about when and how meta-learning algorithms should be deployed. In this paper, we seek to clarify these questions by 1. proposing a novel metric -- the diversity coefficient -- to measure the diversity of tasks in a few-shot learning b… ▽ More

    Submitted 2 August, 2022; originally announced August 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2112.13121

  43. arXiv:2205.15891  [pdf, ps, other

    cs.LG stat.ML

    One Policy is Enough: Parallel Exploration with a Single Policy is Near-Optimal for Reward-Free Reinforcement Learning

    Authors: Pedro Cisneros-Velarde, Boxiang Lyu, Sanmi Koyejo, Mladen Kolar

    Abstract: Although parallelism has been extensively used in reinforcement learning (RL), the quantitative effects of parallel exploration are not well understood theoretically. We study the benefits of simple parallel exploration for reward-free RL in linear Markov decision processes (MDPs) and two-player zero-sum Markov games (MGs). In contrast to the existing literature, which focuses on approaches that e… ▽ More

    Submitted 1 March, 2023; v1 submitted 31 May, 2022; originally announced May 2022.

    Comments: 50 pages

  44. arXiv:2205.01094  [pdf, other

    cs.CR cs.LG q-fin.ST

    A Word is Worth A Thousand Dollars: Adversarial Attack on Tweets Fools Stock Predictions

    Authors: Yong Xie, Dakuo Wang, Pin-Yu Chen, **jun Xiong, Sijia Liu, Sanmi Koyejo

    Abstract: More and more investors and machine learning models rely on social media (e.g., Twitter and Reddit) to gather real-time information and sentiment to predict stock price movements. Although text-based models are known to be vulnerable to adversarial attacks, whether stock prediction models have similar vulnerability is underexplored. In this paper, we experiment with a variety of adversarial attack… ▽ More

    Submitted 12 July, 2022; v1 submitted 1 May, 2022; originally announced May 2022.

    Comments: NAACL short paper, github: https://github.com/yonxie/AdvFinTweet

  45. arXiv:2201.12947  [pdf, other

    stat.ML cs.LG

    Fair Wrap** for Black-box Predictions

    Authors: Alexander Soen, Ibrahim Alabdulmohsin, Sanmi Koyejo, Yishay Mansour, Nyalleng Moorosi, Richard Nock, Ke Sun, Lexing Xie

    Abstract: We introduce a new family of techniques to post-process ("wrap") a black-box classifier in order to reduce its bias. Our technique builds on the recent analysis of improper loss functions whose optimization can correct any twist in prediction, unfairness being treated as a twist. In the post-processing, we learn a wrapper function which we define as an $α$-tree, which modifies the prediction. We p… ▽ More

    Submitted 1 November, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

    Comments: Published in Advances in Neural Information Processing Systems 35 (NeurIPS 2022)

  46. arXiv:2112.13137  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    Does MAML Only Work via Feature Re-use? A Data Centric Perspective

    Authors: Brando Miranda, Yu-Xiong Wang, Sanmi Koyejo

    Abstract: Recent work has suggested that a good embedding is all we need to solve many few-shot learning benchmarks. Furthermore, other work has strongly suggested that Model Agnostic Meta-Learning (MAML) also works via this same method - by learning a good embedding. These observations highlight our lack of understanding of what meta-learning algorithms are doing and when they work. In this work, we provid… ▽ More

    Submitted 24 December, 2021; originally announced December 2021.

    Comments: 15 pages, 12 figures

  47. arXiv:2112.13121   

    cs.LG cs.AI cs.CV cs.NE

    The Curse of Zero Task Diversity: On the Failure of Transfer Learning to Outperform MAML and their Empirical Equivalence

    Authors: Brando Miranda, Yu-Xiong Wang, Sanmi Koyejo

    Abstract: Recently, it has been observed that a transfer learning solution might be all we need to solve many few-shot learning benchmarks -- thus raising important questions about when and how meta-learning algorithms should be deployed. In this paper, we seek to clarify these questions by proposing a novel metric -- the diversity coefficient -- to measure the diversity of tasks in a few-shot learning benc… ▽ More

    Submitted 28 November, 2022; v1 submitted 24 December, 2021; originally announced December 2021.

    Comments: An updated version with updated correction is at arXiv:2208.01545 and it's acompanying neurips submission is at https://brando90.github.io/brandomiranda/publications.html

  48. arXiv:2107.06917  [pdf, other

    cs.LG

    A Field Guide to Federated Optimization

    Authors: Jianyu Wang, Zachary Charles, Zheng Xu, Gauri Joshi, H. Brendan McMahan, Blaise Aguera y Arcas, Maruan Al-Shedivat, Galen Andrew, Salman Avestimehr, Katharine Daly, Deepesh Data, Suhas Diggavi, Hubert Eichner, Advait Gadhikar, Zachary Garrett, Antonious M. Girgis, Filip Hanzely, Andrew Hard, Chaoyang He, Samuel Horvath, Zhouyuan Huo, Alex Ingerman, Martin Jaggi, Tara Javidi, Peter Kairouz , et al. (28 additional authors not shown)

    Abstract: Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and… ▽ More

    Submitted 14 July, 2021; originally announced July 2021.

  49. arXiv:2007.05597  [pdf, other

    eess.IV cs.CV cs.LG

    EMIXER: End-to-end Multimodal X-ray Generation via Self-supervision

    Authors: Siddharth Biswal, Peiye Zhuang, Ayis Pyrros, Nasir Siddiqui, Sanmi Koyejo, Jimeng Sun

    Abstract: Deep generative models have enabled the automated synthesis of high-quality data for diverse applications. However, the most effective generative models are specialized to data from a single domain (e.g., images or text). Real-world applications such as healthcare require multi-modal data from multiple domains (e.g., both images and corresponding text), which are difficult to acquire due to limite… ▽ More

    Submitted 15 January, 2021; v1 submitted 10 July, 2020; originally announced July 2020.

  50. arXiv:2006.13485  [pdf, other

    cs.LG stat.ML

    Fairness with Overlap** Groups

    Authors: Forest Yang, Moustapha Cisse, Sanmi Koyejo

    Abstract: In algorithmically fair prediction problems, a standard goal is to ensure the equality of fairness metrics across multiple overlap** groups simultaneously. We reconsider this standard fair classification problem using a probabilistic population analysis, which, in turn, reveals the Bayes-optimal classifier. Our approach unifies a variety of existing group-fair classification methods and enables… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.