Skip to main content

Showing 1–50 of 55 results for author: Weller, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.13493  [pdf, other

    cs.LG stat.ML

    In-Context In-Context Learning with Transformer Neural Processes

    Authors: Matthew Ashman, Cristiana Diaconu, Adrian Weller, Richard E. Turner

    Abstract: Neural processes (NPs) are a powerful family of meta-learning models that seek to approximate the posterior predictive map of the ground-truth stochastic process from which each dataset in a meta-dataset is sampled. There are many cases in which practitioners, besides having access to the dataset of interest, may also have access to other datasets that share similarities with it. In this case, int… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  2. arXiv:2406.13488  [pdf, other

    stat.ML cs.LG

    Approximately Equivariant Neural Processes

    Authors: Matthew Ashman, Cristiana Diaconu, Adrian Weller, Wessel Bruinsma, Richard E. Turner

    Abstract: Equivariant deep learning architectures exploit symmetries in learning problems to improve the sample efficiency of neural-network-based models and their ability to generalise. However, when modelling real-world data, learning problems are often not exactly equivariant, but only approximately. For example, when estimating the global temperature field from weather station observations, local topogr… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  3. arXiv:2406.08391  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Large Language Models Must Be Taught to Know What They Don't Know

    Authors: Sanyam Kapoor, Nate Gruver, Manley Roberts, Katherine Collins, Arka Pal, Umang Bhatt, Adrian Weller, Samuel Dooley, Micah Goldblum, Andrew Gordon Wilson

    Abstract: When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibrati… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Code available at: https://github.com/activatedgeek/calibration-tuning

  4. arXiv:2405.16541  [pdf, other

    stat.ML cs.LG

    Variance-Reducing Couplings for Random Features: Perspectives from Optimal Transport

    Authors: Isaac Reid, Stratis Markou, Krzysztof Choromanski, Richard E. Turner, Adrian Weller

    Abstract: Random features (RFs) are a popular technique to scale up kernel methods in machine learning, replacing exact kernel evaluations with stochastic Monte Carlo estimates. They underpin models as diverse as efficient transformers (by approximating attention) to sparse spectrum Gaussian processes (by approximating the covariance function). Efficiency can be further improved by speeding up the convergen… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  5. arXiv:2310.15786  [pdf, other

    stat.ML cs.LG

    Amortised Inference in Neural Networks for Small-Scale Probabilistic Meta-Learning

    Authors: Matthew Ashman, Tommy Rochussen, Adrian Weller

    Abstract: The global inducing point variational approximation for BNNs is based on using a set of inducing inputs to construct a series of conditional distributions that accurately approximate the conditionals of the true posterior distribution. Our key insight is that these inducing inputs can be replaced by the actual data, such that the variational distribution consists of a set of approximate likelihood… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  6. arXiv:2310.04859  [pdf, other

    stat.ML cs.LG

    General Graph Random Features

    Authors: Isaac Reid, Krzysztof Choromanski, Eli Berger, Adrian Weller

    Abstract: We propose a novel random walk-based algorithm for unbiased estimation of arbitrary functions of a weighted adjacency matrix, coined universal graph random features (u-GRFs). This includes many of the most popular examples of kernels defined on the nodes of a graph. Our algorithm enjoys subquadratic time complexity with respect to the number of nodes, overcoming the notoriously prohibitive cubic s… ▽ More

    Submitted 24 May, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

  7. arXiv:2310.04854  [pdf, other

    stat.ML cs.LG

    Repelling Random Walks

    Authors: Isaac Reid, Eli Berger, Krzysztof Choromanski, Adrian Weller

    Abstract: We present a novel quasi-Monte Carlo mechanism to improve graph-based sampling, coined repelling random walks. By inducing correlations between the trajectories of an interacting ensemble such that their marginal transition probabilities are unmodified, we are able to explore the graph more efficiently, improving the concentration of statistical estimators whilst leaving them unbiased. The mechani… ▽ More

    Submitted 24 May, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

  8. arXiv:2305.12470  [pdf, other

    stat.ML cs.LG

    Quasi-Monte Carlo Graph Random Features

    Authors: Isaac Reid, Krzysztof Choromanski, Adrian Weller

    Abstract: We present a novel mechanism to improve the accuracy of the recently-introduced class of graph random features (GRFs). Our method induces negative correlations between the lengths of the algorithm's random walks by imposing antithetic termination: a procedure to sample more diverse random walks which may be of independent interest. It has a trivial drop-in implementation. We derive strong theoreti… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

  9. arXiv:2303.06484  [pdf, other

    cs.LG cs.CV stat.ML

    Generalizing and Decoupling Neural Collapse via Hyperspherical Uniformity Gap

    Authors: Weiyang Liu, Longhui Yu, Adrian Weller, Bernhard Schölkopf

    Abstract: The neural collapse (NC) phenomenon describes an underlying geometric symmetry for deep neural networks, where both deeply learned features and classifiers converge to a simplex equiangular tight frame. It has been shown that both cross-entropy loss and mean square error can provably lead to NC. We remove NC's key assumption on the feature dimension and the number of classes, and then present a ge… ▽ More

    Submitted 15 April, 2023; v1 submitted 11 March, 2023; originally announced March 2023.

    Comments: ICLR 2023 (v2: fixed typos)

  10. arXiv:2302.10701  [pdf, other

    cs.LG cs.AI cs.IT stat.ML

    Scalable Infomin Learning

    Authors: Yanzhi Chen, Weihao Sun, Yingzhen Li, Adrian Weller

    Abstract: The task of infomin learning aims to learn a representation with high utility while being uninformative about a specified target, with the latter achieved by minimising the mutual information between the representation and the target. It has broad applications, ranging from training fair prediction models against protected attributes, to unsupervised learning with disentangled representations. Rec… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    Comments: 10 pages, accepted to NeurIPS 2022, slightly improved version

  11. arXiv:2302.00787  [pdf, other

    cs.LG stat.ML

    FAVOR#: Sharp Attention Kernel Approximations via New Classes of Positive Random Features

    Authors: Valerii Likhosherstov, Krzysztof Choromanski, Avinava Dubey, Frederick Liu, Tamas Sarlos, Adrian Weller

    Abstract: The problem of efficient approximation of a linear operator induced by the Gaussian or softmax kernel is often addressed using random features (RFs) which yield an unbiased approximation of the operator's result. Such operators emerge in important applications ranging from kernel methods to efficient Transformers. We propose parameterized, positive, non-trigonometric RFs which approximate Gaussian… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

  12. arXiv:2301.13856  [pdf, other

    stat.ML cs.LG

    Simplex Random Features

    Authors: Isaac Reid, Krzysztof Choromanski, Valerii Likhosherstov, Adrian Weller

    Abstract: We present Simplex Random Features (SimRFs), a new random feature (RF) mechanism for unbiased approximation of the softmax and Gaussian kernels by geometrical correlation of random projection vectors. We prove that SimRFs provide the smallest possible mean square error (MSE) on unbiased estimates of these kernels among the class of weight-independent geometrically-coupled positive random feature (… ▽ More

    Submitted 7 October, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

  13. Racial Disparities in the Enforcement of Marijuana Violations in the US

    Authors: Bradley Butcher, Chris Robinson, Miri Zilka, Riccardo Fogliato, Carolyn Ashurst, Adrian Weller

    Abstract: Racial disparities in US drug arrest rates have been observed for decades, but their causes and policy implications are still contested. Some have argued that the disparities largely reflect differences in drug use between racial groups, while others have hypothesized that discriminatory enforcement policies and police practices play a significant role. In this work, we analyze racial disparities… ▽ More

    Submitted 1 June, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

    Comments: AAAI/ACM Conference on AI, Ethics, and Society 2022

  14. arXiv:2202.12275  [pdf, other

    stat.ML cs.LG

    Partitioned Variational Inference: A Framework for Probabilistic Federated Learning

    Authors: Matthew Ashman, Thang D. Bui, Cuong V. Nguyen, Stratis Markou, Adrian Weller, Siddharth Swaroop, Richard E. Turner

    Abstract: The proliferation of computing devices has brought about an opportunity to deploy machine learning models on new problem domains using previously inaccessible data. Traditional algorithms for training such models often require data to be stored on a single machine with compute performed by a single node, making them unsuitable for decentralised training on multiple devices. This deficiency has mot… ▽ More

    Submitted 28 April, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:1811.11206

  15. arXiv:2202.01315  [pdf, other

    cs.LG stat.AP

    Approximating Full Conformal Prediction at Scale via Influence Functions

    Authors: Javier Abad, Umang Bhatt, Adrian Weller, Giovanni Cherubin

    Abstract: Conformal prediction (CP) is a wrapper around traditional machine learning models, giving coverage guarantees under the sole assumption of exchangeability; in classification problems, for a chosen significance level $\varepsilon$, CP guarantees that the error rate is at most $\varepsilon$, irrespective of whether the underlying model is misspecified. However, the prohibitive computational costs of… ▽ More

    Submitted 22 February, 2023; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: 18 pages, 13 figures

  16. arXiv:2112.02646  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    Diverse, Global and Amortised Counterfactual Explanations for Uncertainty Estimates

    Authors: Dan Ley, Umang Bhatt, Adrian Weller

    Abstract: To interpret uncertainty estimates from differentiable probabilistic models, recent work has proposed generating a single Counterfactual Latent Uncertainty Explanation (CLUE) for a given data point where the model is uncertain, identifying a single, on-manifold change to the input such that the model becomes more certain in its prediction. We broaden the exploration to examine $δ$-CLUE, the set of… ▽ More

    Submitted 8 December, 2021; v1 submitted 5 December, 2021; originally announced December 2021.

    Comments: Accepted as a conference paper to AAAI 2022

  17. arXiv:2110.04367  [pdf, other

    cs.LG stat.ML

    Hybrid Random Features

    Authors: Krzysztof Choromanski, Haoxian Chen, Han Lin, Yuanzhe Ma, Arijit Sehanobish, Deepali Jain, Michael S Ryoo, Jake Varley, Andy Zeng, Valerii Likhosherstov, Dmitry Kalashnikov, Vikas Sindhwani, Adrian Weller

    Abstract: We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs) that automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest. Special instantiations of HRFs lead to well-known methods such as trigonometric (Rahimi and Recht, 2007) or (recently introduced in the… ▽ More

    Submitted 30 January, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

    Comments: Published as a conference paper at ICLR 2022

  18. arXiv:2105.02725  [pdf, other

    cs.LG cs.SI stat.ML

    CrossWalk: Fairness-enhanced Node Representation Learning

    Authors: Ahmad Khajehnejad, Moein Khajehnejad, Mahmoudreza Babaei, Krishna P. Gummadi, Adrian Weller, Baharan Mirzasoleiman

    Abstract: The potential for machine learning systems to amplify social inequities and unfairness is receiving increasing popular and academic attention. Much recent work has focused on develo** algorithmic tools to assess and mitigate such unfairness. However, there is little work on enhancing fairness in graph algorithms. Here, we develop a simple, effective and general method, CrossWalk, that enhances f… ▽ More

    Submitted 25 March, 2022; v1 submitted 6 May, 2021; originally announced May 2021.

    Comments: Association for the Advancement of Artificial Intelligence (AAAI) 2022

  19. arXiv:2104.06323  [pdf, other

    cs.LG cs.AI stat.ML

    δ-CLUE: Diverse Sets of Explanations for Uncertainty Estimates

    Authors: Dan Ley, Umang Bhatt, Adrian Weller

    Abstract: To interpret uncertainty estimates from differentiable probabilistic models, recent work has proposed generating Counterfactual Latent Uncertainty Explanations (CLUEs). However, for a single input, such approaches could output a variety of explanations due to the lack of constraints placed on the explanation. Here we augment the original CLUE approach, to provide what we call $δ$-CLUE. CLUE indica… ▽ More

    Submitted 3 December, 2021; v1 submitted 13 April, 2021; originally announced April 2021.

    Comments: Appeared as a workshop paper at ICLR 2021 (Responsible AI | Secure ML | Robust ML)

  20. arXiv:2010.06529  [pdf, other

    cs.LG cs.AI stat.ML

    On the Fairness of Causal Algorithmic Recourse

    Authors: Julius von Kügelgen, Amir-Hossein Karimi, Umang Bhatt, Isabel Valera, Adrian Weller, Bernhard Schölkopf

    Abstract: Algorithmic fairness is typically studied from the perspective of predictions. Instead, here we investigate fairness from the perspective of recourse actions suggested to individuals to remedy an unfavourable classification. We propose two new fairness criteria at the group and individual level, which -- unlike prior work on equalising the average group-wise distance from the decision boundary --… ▽ More

    Submitted 6 March, 2022; v1 submitted 13 October, 2020; originally announced October 2020.

    Comments: AAAI 2022 extended camera-ready version with technical appendices. (9 pages main paper + references + appendices)

  21. arXiv:2009.14794  [pdf, other

    cs.LG cs.CL stat.ML

    Rethinking Attention with Performers

    Authors: Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller

    Abstract: We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness. To approximate softmax attention-kernels, Performers use a novel Fast Attention Via positive Orthogonal Random featu… ▽ More

    Submitted 19 November, 2022; v1 submitted 30 September, 2020; originally announced September 2020.

    Comments: Published as a conference paper + oral presentation at ICLR 2021. 38 pages. See https://github.com/google-research/google-research/tree/master/protein_lm for protein language model code, and https://github.com/google-research/google-research/tree/master/performer for Performer code. See https://ai.googleblog.com/2020/10/rethinking-attention-with-performers.html for Google AI Blog

  22. arXiv:2007.01174  [pdf, other

    cs.LG stat.ML

    Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch

    Authors: Luca Viano, Yu-Ting Huang, Parameswaran Kamalaruban, Adrian Weller, Volkan Cevher

    Abstract: We study the inverse reinforcement learning (IRL) problem under a transition dynamics mismatch between the expert and the learner. Specifically, we consider the Maximum Causal Entropy (MCE) IRL learner model and provide a tight upper bound on the learner's performance degradation based on the $\ell_1$-distance between the transition dynamics of the expert and the learner. Leveraging insights from… ▽ More

    Submitted 30 November, 2021; v1 submitted 2 July, 2020; originally announced July 2020.

  23. arXiv:2006.11421  [pdf, other

    cs.LG math.CA math.DS math.OC stat.ML

    An Ode to an ODE

    Authors: Krzysztof Choromanski, Jared Quincy Davis, Valerii Likhosherstov, Xingyou Song, Jean-Jacques Slotine, Jacob Varley, Honglak Lee, Adrian Weller, Vikas Sindhwani

    Abstract: We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the orthogonal group O(d). This nested system of two flows, where the parameter-flow is constrained to lie on the compact manifold, provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem wh… ▽ More

    Submitted 22 June, 2020; v1 submitted 19 June, 2020; originally announced June 2020.

    Comments: 20 pages, 9 figures

  24. arXiv:2006.06848  [pdf, other

    stat.ML cs.LG

    Getting a CLUE: A Method for Explaining Uncertainty Estimates

    Authors: Javier Antorán, Umang Bhatt, Tameem Adel, Adrian Weller, José Miguel Hernández-Lobato

    Abstract: Both uncertainty estimation and interpretability are important factors for trustworthy machine learning systems. However, there is little work at the intersection of these two areas. We address this gap by proposing a novel method for interpreting uncertainty estimates from differentiable probabilistic models, like Bayesian Neural Networks (BNNs). Our method, Counterfactual Latent Uncertainty Expl… ▽ More

    Submitted 18 March, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: Accepted as an oral presentation at ICLR 2021

  25. arXiv:2006.03631  [pdf, other

    cs.LG math.OC stat.ML

    UFO-BLO: Unbiased First-Order Bilevel Optimization

    Authors: Valerii Likhosherstov, Xingyou Song, Krzysztof Choromanski, Jared Davis, Adrian Weller

    Abstract: Bilevel optimization (BLO) is a popular approach with many applications including hyperparameter optimization, neural architecture search, adversarial robustness and model-agnostic meta-learning. However, the approach suffers from time and memory complexity proportional to the length $r$ of its inner optimization loop, which has led to several modifications being proposed. One such modification is… ▽ More

    Submitted 7 June, 2021; v1 submitted 5 June, 2020; originally announced June 2020.

  26. arXiv:2006.03555  [pdf, other

    cs.LG cs.CL stat.ML

    Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers

    Authors: Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, David Belanger, Lucy Colwell, Adrian Weller

    Abstract: Transformer models have achieved state-of-the-art results across a diverse range of domains. However, concern over the cost of training the attention mechanism to learn complex dependencies between distant inputs continues to grow. In response, solutions that exploit the structure and sparsity of the learned attention matrix have blossomed. However, real-world applications that involve long sequen… ▽ More

    Submitted 30 September, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

    Comments: This arXiv submission has been deprecated. Please see "Rethinking Attention with Performers" at arXiv:2009.14794 for the most updated version of the paper

  27. arXiv:2005.04074  [pdf, other

    cs.LG cs.SI stat.ML

    Adversarial Graph Embeddings for Fair Influence Maximization over Social Networks

    Authors: Moein Khajehnejad, Ahmad Asgharian Rezaei, Mahmoudreza Babaei, Jessica Hoffmann, Mahdi Jalili, Adrian Weller

    Abstract: Influence maximization is a widely studied topic in network science, where the aim is to reach the maximum possible number of nodes, while only targeting a small initial set of individuals. It has critical applications in many fields, including viral marketing, information propagation, news dissemination, and vaccinations. However, the objective does not usually take into account whether the final… ▽ More

    Submitted 10 May, 2020; v1 submitted 8 May, 2020; originally announced May 2020.

    Comments: In Proc. of the 29th International Joint Conference on Artificial Intelligence (IJCAI'20), 2020

  28. arXiv:2005.01906  [pdf, other

    cs.LG stat.ML

    Time Dependence in Non-Autonomous Neural ODEs

    Authors: Jared Quincy Davis, Krzysztof Choromanski, Jake Varley, Honglak Lee, Jean-Jacques Slotine, Valerii Likhosterov, Adrian Weller, Ameesh Makadia, Vikas Sindhwani

    Abstract: Neural Ordinary Differential Equations (ODEs) are elegant reinterpretations of deep networks where continuous time can replace the discrete notion of depth, ODE solvers perform forward propagation, and the adjoint method enables efficient, constant memory backpropagation. Neural ODEs are universal approximators only when they are non-autonomous, that is, the dynamics depends explicitly on time. We… ▽ More

    Submitted 6 May, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

  29. arXiv:2005.00631  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    Evaluating and Aggregating Feature-based Model Explanations

    Authors: Umang Bhatt, Adrian Weller, José M. F. Moura

    Abstract: A feature-based model explanation denotes how much each input feature contributes to a model's output for a given data point. As the number of proposed explanation functions grows, we lack quantitative evaluation criteria to help practitioners know when to use which explanation function. This paper proposes quantitative evaluation criteria for feature-based explanations: low sensitivity, high fait… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

    Comments: Accepted at IJCAI 2020

  30. arXiv:2004.08675  [pdf, other

    cs.LG stat.ML

    CWY Parametrization: a Solution for Parallelized Optimization of Orthogonal and Stiefel Matrices

    Authors: Valerii Likhosherstov, Jared Davis, Krzysztof Choromanski, Adrian Weller

    Abstract: We introduce an efficient approach for optimization over orthogonal groups on highly parallel computation units such as GPUs or TPUs. As in earlier work, we parametrize an orthogonal matrix as a product of Householder reflections. However, to overcome low parallelization capabilities of computing Householder reflections sequentially, we propose employing an accumulation scheme called the compact W… ▽ More

    Submitted 16 February, 2021; v1 submitted 18 April, 2020; originally announced April 2020.

    Comments: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021, San Diego, California, USA. PMLR: Volume 130. Copyright 2021 by the author(s)

  31. arXiv:2004.04690  [pdf, other

    cs.LG cs.CV stat.ML

    Orthogonal Over-Parameterized Training

    Authors: Weiyang Liu, Rongmei Lin, Zhen Liu, James M. Rehg, Liam Paull, Li Xiong, Le Song, Adrian Weller

    Abstract: The inductive bias of a neural network is largely determined by the architecture and the training algorithm. To achieve good generalization, how to effectively train a neural network is of great importance. We propose a novel orthogonal over-parameterized training (OPT) framework that can provably minimize the hyperspherical energy which characterizes the diversity of neurons on a hypersphere. By… ▽ More

    Submitted 4 June, 2021; v1 submitted 9 April, 2020; originally announced April 2020.

    Comments: CVPR 2021 Oral (43 Pages, Substantial Update from v3, Typos Fixed from v5)

  32. arXiv:2003.13563  [pdf, other

    cs.LG stat.ML

    Stochastic Flows and Geometric Optimization on the Orthogonal Group

    Authors: Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas Sindhwani

    Abstract: We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group $O(d)$ and naturally reductive homogeneous manifolds obtained from the action of the rotation group $SO(d)$. We theoretically and experimentally demonstrate that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinf… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

  33. arXiv:1910.13983  [pdf, other

    cs.LG cs.CY stat.ML

    DADI: Dynamic Discovery of Fair Information with Adversarial Reinforcement Learning

    Authors: Michiel A. Bakker, Duy Patrick Tu, Humberto Riverón Valdés, Krishna P. Gummadi, Kush R. Varshney, Adrian Weller, Alex Pentland

    Abstract: We introduce a framework for dynamic adversarial discovery of information (DADI), motivated by a scenario where information (a feature set) is used by third parties with unknown objectives. We train a reinforcement learning agent to sequentially acquire a subset of the information while balancing accuracy and fairness of predictors downstream. Based on the set of already acquired features, the age… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

    Comments: Accepted at NeurIPS 2019 HCML Workshop

  34. arXiv:1910.03962  [pdf, other

    stat.ML cs.LG

    Optimal experimental design via Bayesian optimization: active causal structure learning for Gaussian process networks

    Authors: Julius von Kügelgen, Paul K Rubenstein, Bernhard Schölkopf, Adrian Weller

    Abstract: We study the problem of causal discovery through targeted interventions. Starting from few observational measurements, we follow a Bayesian active learning approach to perform those experiments which, in expectation with respect to the current model, are maximally informative about the underlying causal structure. Unlike previous work, we consider the setting of continuous random variables with no… ▽ More

    Submitted 9 October, 2019; originally announced October 2019.

    Comments: Working paper. Accepted as a poster at the NeurIPS 2019 workshop, "Do the right thing": machine learning and causal inference for improved decision making. (6 pages + references + appendix)

  35. arXiv:1909.06342  [pdf, ps, other

    cs.LG cs.AI cs.CY cs.HC stat.ML

    Explainable Machine Learning in Deployment

    Authors: Umang Bhatt, Alice Xiang, Shubham Sharma, Adrian Weller, Ankur Taly, Yunhan Jia, Joydeep Ghosh, Ruchir Puri, José M. F. Moura, Peter Eckersley

    Abstract: Explainable machine learning offers the potential to provide stakeholders with insights into model behavior by using various methods such as feature importance scores, counterfactual explanations, or influential training data. Yet there is little understanding of how organizations use these methods in practice. This study explores how organizations view and use explainability for stakeholder consu… ▽ More

    Submitted 10 July, 2020; v1 submitted 13 September, 2019; originally announced September 2019.

    Comments: ACM Conference on Fairness, Accountability, and Transparency 2020

  36. arXiv:1907.01040  [pdf, other

    cs.LG cs.CY stat.ML

    The Sensitivity of Counterfactual Fairness to Unmeasured Confounding

    Authors: Niki Kilbertus, Philip J. Ball, Matt J. Kusner, Adrian Weller, Ricardo Silva

    Abstract: Causal approaches to fairness have seen substantial recent interest, both from the machine learning community and from wider parties interested in ethical prediction algorithms. In no small part, this has been due to the fact that causal models allow one to simultaneously leverage data and expert knowledge to remove discriminatory effects from predictions. However, one of the primary assumptions i… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

    Comments: published at UAI 2019

  37. arXiv:1905.10395  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models: Extension

    Authors: Yunfei Teng, Wenbo Gao, Francois Chalus, Anna Choromanska, Donald Goldfarb, Adrian Weller

    Abstract: We consider distributed optimization under communication constraints for training deep learning models. We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by the currently best-performing worker (leader). Our method differs from the parameter-averaging scheme EASGD in a number of ways: (i) our objective formulation d… ▽ More

    Submitted 28 April, 2022; v1 submitted 24 May, 2019; originally announced May 2019.

    Comments: Extension of LSGD published in NeurIPS 2019. 24 pages

  38. arXiv:1903.03784  [pdf, other

    stat.ML cs.LG

    Orthogonal Estimation of Wasserstein Distances

    Authors: Mark Rowland, Jiri Hron, Yunhao Tang, Krzysztof Choromanski, Tamas Sarlos, Adrian Weller

    Abstract: Wasserstein distances are increasingly used in a wide variety of applications in machine learning. Sliced Wasserstein distances form an important subclass which may be estimated efficiently through one-dimensional sorting operations. In this paper, we propose a new variant of sliced Wasserstein distance, study the use of orthogonal coupling in Monte Carlo estimation of Wasserstein distances and dr… ▽ More

    Submitted 5 April, 2019; v1 submitted 9 March, 2019; originally announced March 2019.

    Comments: Published at AISTATS 2019

  39. arXiv:1812.01339  [pdf, other

    stat.ML cs.LG

    Self-Guided Belief Propagation -- A Homotopy Continuation Method

    Authors: Christian Knoll, Adrian Weller, Franz Pernkopf

    Abstract: Belief propagation (BP) is a popular method for performing probabilistic inference on graphical models. In this work, we enhance BP and propose self-guided belief propagation (SBP) that incorporates the pairwise potentials only gradually. This homotopy continuation method converges to a unique solution and increases the accuracy without increasing the computational burden. We provide a formal anal… ▽ More

    Submitted 19 March, 2021; v1 submitted 4 December, 2018; originally announced December 2018.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  40. arXiv:1807.01308   

    stat.ML cs.LG

    Proceedings of the 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018)

    Authors: Been Kim, Kush R. Varshney, Adrian Weller

    Abstract: This is the Proceedings of the 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), which was held in Stockholm, Sweden, July 14, 2018. Invited speakers were Barbara Engelhardt, Cynthia Rudin, Fernanda Viégas, and Martin Wattenberg.

    Submitted 3 July, 2018; originally announced July 2018.

  41. arXiv:1807.00787  [pdf, other

    cs.LG cs.CY stat.ML

    A Unified Approach to Quantifying Algorithmic Unfairness: Measuring Individual & Group Unfairness via Inequality Indices

    Authors: Till Speicher, Hoda Heidari, Nina Grgic-Hlaca, Krishna P. Gummadi, Adish Singla, Adrian Weller, Muhammad Bilal Zafar

    Abstract: Discrimination via algorithmic decision making has received considerable attention. Prior work largely focuses on defining conditions for fairness, but does not define satisfactory measures of algorithmic unfairness. In this paper, we focus on the following question: Given two unfair algorithms, how should we determine which of the two is more unfair? Our core idea is to use existing inequality in… ▽ More

    Submitted 2 July, 2018; originally announced July 2018.

    Comments: 12 pages 7 figures To be published in: KDD '18: The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Proceedings

  42. arXiv:1806.03281  [pdf, other

    stat.ML cs.CR cs.CY cs.LG

    Blind Justice: Fairness with Encrypted Sensitive Attributes

    Authors: Niki Kilbertus, Adrià Gascón, Matt J. Kusner, Michael Veale, Krishna P. Gummadi, Adrian Weller

    Abstract: Recent work has explored how to train machine learning models which do not discriminate against any subgroup of the population as determined by sensitive attributes such as gender or race. To avoid disparate treatment, sensitive attributes should not be considered. On the other hand, in order to avoid disparate impact, sensitive attributes must be examined, e.g., in order to learn a fair model, or… ▽ More

    Submitted 8 June, 2018; originally announced June 2018.

    Comments: published at ICML 2018

    Journal ref: Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2630-2639, 2018

  43. arXiv:1804.02395  [pdf, other

    cs.LG cs.RO stat.ML

    Structured Evolution with Compact Architectures for Scalable Policy Optimization

    Authors: Krzysztof Choromanski, Mark Rowland, Vikas Sindhwani, Richard E. Turner, Adrian Weller

    Abstract: We present a new method of blackbox optimization via gradient approximation with the use of structured random orthogonal matrices, providing more accurate estimators than baselines and with provable theoretical guarantees. We show that this algorithm can be successfully applied to learn better quality compact policies than those using standard gradient estimation techniques. The compact policies w… ▽ More

    Submitted 12 June, 2018; v1 submitted 6 April, 2018; originally announced April 2018.

  44. Bucket Renormalization for Approximate Inference

    Authors: Sungsoo Ahn, Michael Chertkov, Adrian Weller, **woo Shin

    Abstract: Probabilistic graphical models are a key tool in machine learning applications. Computing the partition function, i.e., normalizing constant, is a fundamental task of statistical inference but it is generally computationally intractable, leading to extensive study of approximation methods. Iterative variational methods are a popular and successful family of approaches. However, even state of the a… ▽ More

    Submitted 20 March, 2018; v1 submitted 13 March, 2018; originally announced March 2018.

  45. arXiv:1802.09548  [pdf, other

    stat.ML cs.CY cs.LG

    Human Perceptions of Fairness in Algorithmic Decision Making: A Case Study of Criminal Risk Prediction

    Authors: Nina Grgić-Hlača, Elissa M. Redmiles, Krishna P. Gummadi, Adrian Weller

    Abstract: As algorithms are increasingly used to make important decisions that affect human lives, ranging from social benefit assignment to predicting risk of criminal recidivism, concerns have been raised about the fairness of algorithmic decision making. Most prior works on algorithmic fairness normatively prescribe how fair decisions ought to be made. In contrast, here, we descriptively survey users for… ▽ More

    Submitted 26 February, 2018; originally announced February 2018.

    Comments: To appear in the Proceedings of the Web Conference (WWW 2018). Code available at https://fate-computing.mpi-sws.org/procedural_fairness/

  46. arXiv:1801.01649  [pdf, other

    stat.ML

    Gauged Mini-Bucket Elimination for Approximate Inference

    Authors: Sungsoo Ahn, Michael Chertkov, **woo Shin, Adrian Weller

    Abstract: Computing the partition function $Z$ of a discrete graphical model is a fundamental inference challenge. Since this is computationally intractable, variational approximations are often used in practice. Recently, so-called gauge transformations were used to improve variational lower bounds on $Z$. In this paper, we propose a new gauge-variational approach, termed WMBE-G, which combines gauge trans… ▽ More

    Submitted 4 March, 2018; v1 submitted 5 January, 2018; originally announced January 2018.

  47. arXiv:1711.01134  [pdf

    cs.AI stat.ML

    Accountability of AI Under the Law: The Role of Explanation

    Authors: Finale Doshi-Velez, Mason Kortz, Ryan Budish, Chris Bavitz, Sam Gershman, David O'Brien, Kate Scott, Stuart Schieber, James Waldo, David Weinberger, Adrian Weller, Alexandra Wood

    Abstract: The ubiquity of systems using artificial intelligence or "AI" has brought increasing attention to how those systems should be regulated. The choice of how to regulate AI systems will require care. AI systems have the potential to synthesize large amounts of data, allowing for greater levels of personalization and precision than ever before---applications range from clinical decision support to aut… ▽ More

    Submitted 20 December, 2019; v1 submitted 3 November, 2017; originally announced November 2017.

  48. arXiv:1708.02666   

    stat.ML cs.LG

    Proceedings of the 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017)

    Authors: Been Kim, Dmitry M. Malioutov, Kush R. Varshney, Adrian Weller

    Abstract: This is the Proceedings of the 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), which was held in Sydney, Australia, August 10, 2017. Invited speakers were Tony Jebara, Pang Wei Koh, and David Sontag.

    Submitted 8 August, 2017; originally announced August 2017.

  49. arXiv:1707.00010  [pdf, other

    stat.ML cs.LG

    From Parity to Preference-based Notions of Fairness in Classification

    Authors: Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, Krishna P. Gummadi, Adrian Weller

    Abstract: The adoption of automated, data-driven decision making in an ever expanding range of applications has raised concerns about its potential unfairness towards certain social groups. In this context, a number of recent studies have focused on defining, detecting, and removing unfairness from data-driven decision systems. However, the existing notions of fairness, based on parity (equality) in treatme… ▽ More

    Submitted 28 November, 2017; v1 submitted 30 June, 2017; originally announced July 2017.

    Comments: To appear in Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017). Code available at: https://github.com/mbilalzafar/fair-classification

  50. arXiv:1706.10208  [pdf, other

    stat.ML cs.LG

    On Fairness, Diversity and Randomness in Algorithmic Decision Making

    Authors: Nina Grgić-Hlača, Muhammad Bilal Zafar, Krishna P. Gummadi, Adrian Weller

    Abstract: Consider a binary decision making process where a single machine learning classifier replaces a multitude of humans. We raise questions about the resulting loss of diversity in the decision making process. We study the potential benefits of using random classifier ensembles instead of a single classifier in the context of fairness-aware learning and demonstrate various attractive properties: (i) a… ▽ More

    Submitted 30 June, 2017; originally announced June 2017.

    Comments: Presented as a poster at the 2017 Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2017)