Skip to main content

Showing 1–50 of 96 results for author: Jegelka, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.03682  [pdf, other

    cs.LG

    A Universal Class of Sharpness-Aware Minimization Algorithms

    Authors: Behrooz Tahmasebi, Ashkan Soleymani, Dara Bahri, Stefanie Jegelka, Patrick Jaillet

    Abstract: Recently, there has been a surge in interest in develo** optimization algorithms for overparameterized models as achieving generalization is believed to require algorithms with suitable biases. This interest centers on minimizing sharpness of the original loss function; the Sharpness-Aware Minimization (SAM) algorithm has proven effective. However, most literature only considers a few sharpness… ▽ More

    Submitted 10 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: ICML 2024. Code is available at http://github.com/dbahri/universal_sam

  2. arXiv:2405.20231  [pdf, other

    cs.LG cs.AI stat.ML

    The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof

    Authors: Derek Lim, Moe Putterman, Robin Walters, Haggai Maron, Stefanie Jegelka

    Abstract: Many algorithms and observed phenomena in deep learning appear to be affected by parameter symmetries -- transformations of neural network parameters that do not change the underlying neural network function. These include linear mode connectivity, model merging, Bayesian neural network inference, metanetworks, and several other characteristics of optimization or loss-landscapes. However, theoreti… ▽ More

    Submitted 20 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: 27 pages. Preparing code for release. v2: added / updated some citations

  3. arXiv:2405.18781  [pdf, other

    cs.LG stat.ML

    On the Role of Attention Masks and LayerNorm in Transformers

    Authors: Xinyi Wu, Amir Ajorlou, Yifei Wang, Stefanie Jegelka, Ali Jadbabaie

    Abstract: Self-attention is the key mechanism of transformers, which are the essential building blocks of modern foundation models. Recent studies have shown that pure self-attention suffers from an increasing degree of rank collapse as depth increases, limiting model expressivity and further utilization of model depth. The existing literature on rank collapse, however, has mostly overlooked other critical… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  4. arXiv:2405.18634  [pdf, other

    cs.LG cs.CL stat.ML

    A Theoretical Understanding of Self-Correction through In-context Alignment

    Authors: Yifei Wang, Yuyang Wu, Zeming Wei, Stefanie Jegelka, Yisen Wang

    Abstract: Going beyond mimicking limited human experiences, recent studies show initial evidence that, like humans, large language models (LLMs) are capable of improving their abilities purely by self-correction, i.e., correcting previous responses through self-examination, in certain circumstances. Nevertheless, little is known about how such capabilities arise. In this work, based on a simplified setup ak… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  5. arXiv:2405.18378  [pdf, other

    cs.LG

    A Canonization Perspective on Invariant and Equivariant Learning

    Authors: George Ma, Yifei Wang, Derek Lim, Stefanie Jegelka, Yisen Wang

    Abstract: In many applications, we desire neural networks to exhibit invariance or equivariance to certain groups due to symmetries inherent in the data. Recently, frame-averaging methods emerged to be a unified framework for attaining symmetries efficiently by averaging over input-dependent subsets of the group, i.e., frames. What we currently lack is a principled understanding of the design of frames. In… ▽ More

    Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  6. arXiv:2405.18193  [pdf, other

    cs.LG cs.CV

    In-Context Symmetries: Self-Supervised Learning through Contextual World Models

    Authors: Sharut Gupta, Chenyu Wang, Yifei Wang, Tommi Jaakkola, Stefanie Jegelka

    Abstract: At the core of self-supervised learning for vision is the idea of learning invariant or equivariant representations with respect to a set of data transformations. This approach, however, introduces strong inductive biases, which can render the representations fragile in downstream tasks that do not conform to these symmetries. In this work, drawing insights from world models, we propose to instead… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 32 pages, 24 tables and 11 figures

  7. arXiv:2404.06694  [pdf, other

    cs.LG cs.AI cs.CR

    How to Craft Backdoors with Unlabeled Data Alone?

    Authors: Yifei Wang, Wenhan Ma, Stefanie Jegelka, Yisen Wang

    Abstract: Relying only on unlabeled data, Self-supervised learning (SSL) can learn rich features in an economical and scalable way. As the drive-horse for building foundation models, SSL has received a lot of attention recently with wide applications, which also raises security concerns where backdoor attack is a major type of threat: if the released dataset is maliciously poisoned, backdoored SSL models ca… ▽ More

    Submitted 22 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: Accepted at ICLR 2024 Workshop on Navigating and Addressing Data Problems for Foundation Models (DPFM)

  8. arXiv:2402.02287  [pdf, other

    cs.LG cs.AI cs.DM cs.NE stat.ML

    Future Directions in the Theory of Graph Machine Learning

    Authors: Christopher Morris, Fabrizio Frasca, Nadav Dym, Haggai Maron, İsmail İlkan Ceylan, Ron Levie, Derek Lim, Michael Bronstein, Martin Grohe, Stefanie Jegelka

    Abstract: Machine learning on graphs, especially using graph neural networks (GNNs), has seen a surge in interest due to the wide availability of graph data across a broad spectrum of disciplines, from life to social and engineering sciences. Despite their practical success, our theoretical understanding of the properties of GNNs remains highly incomplete. Recent theoretical advancements primarily focus on… ▽ More

    Submitted 14 June, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  9. arXiv:2401.01869  [pdf, other

    cs.LG cs.DS math.ST stat.ML

    On the hardness of learning under symmetries

    Authors: Bobak T. Kiani, Thien Le, Hannah Lawrence, Stefanie Jegelka, Melanie Weber

    Abstract: We study the problem of learning equivariant neural networks via gradient descent. The incorporation of known symmetries ("equivariance") into neural nets has empirically improved the performance of learning pipelines, in domains ranging from biology to computer vision. However, a rich yet separate line of learning theoretic research has demonstrated that actually learning shallow, fully-connected… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: 52 pages, 4 figures

  10. arXiv:2312.02339  [pdf, other

    cs.LG cs.AI stat.ML

    Expressive Sign Equivariant Networks for Spectral Geometric Learning

    Authors: Derek Lim, Joshua Robinson, Stefanie Jegelka, Haggai Maron

    Abstract: Recent work has shown the utility of develo** machine learning models that respect the structure and symmetries of eigenvectors. These works promote sign invariance, since for any eigenvector v the negation -v is also an eigenvector. However, we show that sign invariance is theoretically limited for tasks such as building orthogonally equivariant models and learning node positional encodings for… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023 Spotlight

  11. arXiv:2311.10610  [pdf, ps, other

    cs.LG stat.ML

    A Poincaré Inequality and Consistency Results for Signal Sampling on Large Graphs

    Authors: Thien Le, Luana Ruiz, Stefanie Jegelka

    Abstract: Large-scale graph machine learning is challenging as the complexity of learning models scales with the graph size. Subsampling the graph is a viable alternative, but sampling on graphs is nontrivial as graphs are non-Euclidean. Existing graph sampling techniques require not only computing the spectra of large matrices but also repeating these computations when the graph changes, e.g., grows. In th… ▽ More

    Submitted 25 March, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

    Comments: 23 pages

  12. arXiv:2311.02868  [pdf, other

    cs.LG

    Sample Complexity Bounds for Estimating Probability Divergences under Invariances

    Authors: Behrooz Tahmasebi, Stefanie Jegelka

    Abstract: Group-invariant probability distributions appear in many data-generative models in machine learning, such as graphs, point clouds, and images. In practice, one often needs to estimate divergences between such distributions. In this work, we study how the inherent invariances, with respect to any smooth action of a Lie group on a manifold, improve sample complexity when estimating the 1-Wasserstein… ▽ More

    Submitted 5 June, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: ICML 2024

  13. arXiv:2310.02579  [pdf, other

    cs.LG cs.AI

    On the Stability of Expressive Positional Encodings for Graphs

    Authors: Yinan Huang, William Lu, Joshua Robinson, Yu Yang, Muhan Zhang, Stefanie Jegelka, Pan Li

    Abstract: Designing effective positional encodings for graphs is key to building powerful graph transformers and enhancing message-passing graph neural networks. Although widespread, using Laplacian eigenvectors as positional encodings faces two fundamental challenges: (1) \emph{Non-uniqueness}: there are many different eigendecompositions of the same Laplacian, and (2) \emph{Instability}: small perturbatio… ▽ More

    Submitted 8 June, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  14. arXiv:2310.00526  [pdf, other

    cs.LG cs.AI cs.DM cs.DS

    Are Graph Neural Networks Optimal Approximation Algorithms?

    Authors: Morris Yau, Eric Lu, Nikolaos Karalias, Jessica Xu, Stefanie Jegelka

    Abstract: In this work we design graph neural network architectures that capture optimal approximation algorithms for a large class of combinatorial optimization problems, using powerful algorithmic tools from semidefinite programming (SDP). Concretely, we prove that polynomial-sized message-passing algorithms can represent the most powerful polynomial time algorithms for Max Constraint Satisfaction Problem… ▽ More

    Submitted 7 February, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

    Comments: Updated references, fixed more typos and wording issues

  15. arXiv:2309.09888  [pdf, other

    cs.LG cs.AI stat.ML

    Context is Environment

    Authors: Sharut Gupta, Stefanie Jegelka, David Lopez-Paz, Kartik Ahuja

    Abstract: Two lines of work are taking the central stage in AI research. On the one hand, the community is making increasing efforts to build models that discard spurious correlations and generalize better in novel test environments. Unfortunately, the bitter lesson so far is that no proposal convincingly outperforms a simple empirical risk minimization baseline. On the other hand, large language models (LL… ▽ More

    Submitted 20 September, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: 41 Pages, 4 Figures

  16. arXiv:2306.13924  [pdf, other

    cs.LG cs.CV

    Structuring Representation Geometry with Rotationally Equivariant Contrastive Learning

    Authors: Sharut Gupta, Joshua Robinson, Derek Lim, Soledad Villar, Stefanie Jegelka

    Abstract: Self-supervised learning converts raw perceptual data such as images to a compact space where simple Euclidean distances measure meaningful variations in data. In this paper, we extend this formulation by adding additional geometric structure to the embedding space by enforcing transformations of input space to correspond to simple (i.e., linear) transformations of embedding space. Specifically, i… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

    Comments: 22 pages

  17. arXiv:2306.13239  [pdf, other

    cs.LG

    The Inductive Bias of Flatness Regularization for Deep Matrix Factorization

    Authors: Khashayar Gatmiry, Zhiyuan Li, Ching-Yao Chuang, Sashank Reddi, Tengyu Ma, Stefanie Jegelka

    Abstract: Recent works on over-parameterized neural networks have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its Hessian) over the family zero-loss solutions. More explicit forms of flatness regularization also empirically improve the generalization performance. However, it remains unclear wh… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

  18. arXiv:2306.04495  [pdf, ps, other

    cs.LG cs.SI

    Limits, approximation and size transferability for GNNs on sparse graphs via graphops

    Authors: Thien Le, Stefanie Jegelka

    Abstract: Can graph neural networks generalize to graphs that are different from the graphs they were trained on, e.g., in size? In this work, we study this question from a theoretical perspective. While recent work established such transferability and approximation results via graph limits, e.g., via graphons, these only apply non-trivially to dense graphs. To include frequently encountered sparse graphs s… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 submission, 34 pages

  19. arXiv:2303.14269  [pdf, ps, other

    cs.LG

    The Exact Sample Complexity Gain from Invariances for Kernel Regression

    Authors: Behrooz Tahmasebi, Stefanie Jegelka

    Abstract: In practice, encoding invariances into models improves sample complexity. In this work, we study this phenomenon from a theoretical perspective. In particular, we provide minimax optimal rates for kernel ridge regression on compact manifolds, with a target function that is invariant to a group action on the manifold. Our results hold for any smooth compact Lie group action, even groups of positive… ▽ More

    Submitted 6 November, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

  20. arXiv:2302.07099  [pdf, other

    physics.ins-det cs.LG

    Tetris-inspired detector with neural network for radiation map**

    Authors: Ryotaro Okabe, Shangjie Xue, Jiankai Yu, Tongtong Liu, Benoit Forget, Stefanie Jegelka, Gordon Kohse, Lin-wen Hu, Mingda Li

    Abstract: In recent years, radiation map** has attracted widespread research attention and increased public concerns on environmental monitoring. In terms of both materials and their configurations, radiation detectors have been developed to locate the directions and positions of the radiation sources. In this process, algorithm is essential in converting detector signals to radiation source information.… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    Comments: 29 pages, 20 figures. Ryotaro Okabe and Shangjie Xue contributed equally to this work

  21. arXiv:2302.00070  [pdf, other

    cs.LG cs.CV

    Debiasing Vision-Language Models via Biased Prompts

    Authors: Ching-Yao Chuang, Varun Jampani, Yuanzhen Li, Antonio Torralba, Stefanie Jegelka

    Abstract: Machine learning models have been shown to inherit biases from their training datasets. This can be particularly problematic for vision-language foundation models trained on uncurated datasets scraped from the internet. The biases can be amplified and propagated to downstream applications like zero-shot classifiers and text-to-image generative models. In this study, we propose a general approach f… ▽ More

    Submitted 15 May, 2023; v1 submitted 31 January, 2023; originally announced February 2023.

  22. arXiv:2301.11419  [pdf, other

    cs.LG q-bio.QM

    Efficiently predicting high resolution mass spectra with graph neural networks

    Authors: Michael Murphy, Stefanie Jegelka, Ernest Fraenkel, Tobias Kind, David Healey, Thomas Butler

    Abstract: Identifying a small molecule from its mass spectrum is the primary open problem in computational metabolomics. This is typically cast as information retrieval: an unknown spectrum is matched against spectra predicted computationally from a large database of chemical structures. However, current approaches to spectrum prediction model the output space in ways that force a tradeoff between capturing… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

  23. arXiv:2212.13669  [pdf, ps, other

    cs.LG math.OC

    Optimal algorithms for group distributionally robust optimization and beyond

    Authors: Tasuku Soma, Khashayar Gatmiry, Stefanie Jegelka

    Abstract: Distributionally robust optimization (DRO) can improve the robustness and fairness of learning methods. In this paper, we devise stochastic algorithms for a class of DRO problems including group DRO, subpopulation fairness, and empirical conditional value at risk (CVaR) optimization. Our new algorithms achieve faster convergence rates than existing algorithms for multiple DRO settings. We also pro… ▽ More

    Submitted 27 December, 2022; originally announced December 2022.

  24. arXiv:2210.03164  [pdf, other

    cs.LG stat.ML

    InfoOT: Information Maximizing Optimal Transport

    Authors: Ching-Yao Chuang, Stefanie Jegelka, David Alvarez-Melis

    Abstract: Optimal transport aligns samples across distributions by minimizing the transportation cost between them, e.g., the geometric distances. Yet, it ignores coherence structure in the data such as clusters, does not handle outliers well, and cannot integrate new data points. To address these drawbacks, we propose InfoOT, an information-theoretic extension of optimal transport that maximizes the mutual… ▽ More

    Submitted 29 May, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

    Journal ref: ICML 2023

  25. arXiv:2210.01906  [pdf, other

    cs.LG stat.ML

    Tree Mover's Distance: Bridging Graph Metrics and Stability of Graph Neural Networks

    Authors: Ching-Yao Chuang, Stefanie Jegelka

    Abstract: Understanding generalization and robustness of machine learning models fundamentally relies on assuming an appropriate metric on the data space. Identifying such a metric is particularly challenging for non-Euclidean data such as graphs. Here, we propose a pseudometric for attributed graphs, the Tree Mover's Distance (TMD), and study its relation to generalization. Via a hierarchical optimal trans… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

    Journal ref: NeurIPS 2022

  26. arXiv:2208.07951  [pdf, other

    cs.LG math.DS math.OC stat.ML

    On the generalization of learning algorithms that do not converge

    Authors: Nisha Chandramoorthy, Andreas Loukas, Khashayar Gatmiry, Stefanie Jegelka

    Abstract: Generalization analyses of deep learning typically assume that the training converges to a fixed point. But, recent results indicate that in practice, the weights of deep neural networks optimized with stochastic gradient descent often oscillate indefinitely. To reduce this discrepancy between theory and practice, this paper focuses on the generalization of neural networks whose training dynamics… ▽ More

    Submitted 19 August, 2022; v1 submitted 16 August, 2022; originally announced August 2022.

    Comments: 27 pages, under review

  27. arXiv:2208.04055  [pdf, other

    cs.LG

    Neural Set Function Extensions: Learning with Discrete Functions in High Dimensions

    Authors: Nikolaos Karalias, Joshua Robinson, Andreas Loukas, Stefanie Jegelka

    Abstract: Integrating functions on discrete domains into neural networks is key to develo** their capability to reason about discrete objects. But, discrete domains are (1) not naturally amenable to gradient-based optimization, and (2) incompatible with deep learning architectures that rely on representations in high-dimensional vector spaces. In this work, we address both difficulties for set functions,… ▽ More

    Submitted 14 November, 2022; v1 submitted 8 August, 2022; originally announced August 2022.

    Comments: NeurIPS 2022

  28. arXiv:2204.07697  [pdf, other

    cs.LG stat.ML

    Theory of Graph Neural Networks: Representation and Learning

    Authors: Stefanie Jegelka

    Abstract: Graph Neural Networks (GNNs), neural network architectures targeted to learning representations of graphs, have become a popular learning model for prediction tasks on nodes, graphs and configurations of points, with wide success in practice. This article summarizes a selection of the emerging theoretical results on approximation and learning properties of widely used message passing GNNs and high… ▽ More

    Submitted 15 April, 2022; originally announced April 2022.

  29. arXiv:2202.13013  [pdf, other

    cs.LG stat.ML

    Sign and Basis Invariant Networks for Spectral Graph Representation Learning

    Authors: Derek Lim, Joshua Robinson, Lingxiao Zhao, Tess Smidt, Suvrit Sra, Haggai Maron, Stefanie Jegelka

    Abstract: We introduce SignNet and BasisNet -- new neural architectures that are invariant to two key symmetries displayed by eigenvectors: (i) sign flips, since if $v$ is an eigenvector then so is $-v$; and (ii) more general basis symmetries, which occur in higher dimensional eigenspaces with infinitely many choices of basis eigenvectors. We prove that under certain conditions our networks are universal, i… ▽ More

    Submitted 30 September, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

    Comments: 42 pages

  30. arXiv:2201.11968  [pdf, other

    cs.LG math.OC

    Training invariances and the low-rank phenomenon: beyond linear networks

    Authors: Thien Le, Stefanie Jegelka

    Abstract: The implicit bias induced by the training of neural networks has become a topic of rigorous study. In the limit of gradient flow and gradient descent with appropriate step size, it has been shown that when one trains a deep linear network with logistic or exponential loss on linearly separable data, the weights converge to rank-1 matrices. In this paper, we extend this theoretical result to the la… ▽ More

    Submitted 25 April, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: 26 pages, 3 figures, ICLR2022

  31. arXiv:2201.04309  [pdf, other

    cs.CV cs.LG

    Robust Contrastive Learning against Noisy Views

    Authors: Ching-Yao Chuang, R Devon Hjelm, Xin Wang, Vibhav Vineet, Neel Joshi, Antonio Torralba, Stefanie Jegelka, Yale Song

    Abstract: Contrastive learning relies on an assumption that positive pairs contain related views, e.g., patches of an image or co-occurring multimodal signals of a video, that share certain underlying information about an instance. But what if this assumption is violated? The literature suggests that contrastive learning produces suboptimal representations in the presence of noisy views, e.g., false positiv… ▽ More

    Submitted 12 January, 2022; originally announced January 2022.

  32. arXiv:2107.02911  [pdf, other

    cs.LG stat.ML

    Scaling up Continuous-Time Markov Chains Helps Resolve Underspecification

    Authors: Alkis Gotovos, Rebekka Burkholz, John Quackenbush, Stefanie Jegelka

    Abstract: Modeling the time evolution of discrete sets of items (e.g., genetic mutations) is a fundamental problem in many biomedical applications. We approach this problem through the lens of continuous-time Markov chains, and show that the resulting learning task is generally underspecified in the usual setting of cross-sectional data. We explore a perhaps surprising remedy: including a number of addition… ▽ More

    Submitted 6 July, 2021; originally announced July 2021.

  33. arXiv:2106.11230  [pdf, other

    cs.LG

    Can contrastive learning avoid shortcut solutions?

    Authors: Joshua Robinson, Li Sun, Ke Yu, Kayhan Batmanghelich, Stefanie Jegelka, Suvrit Sra

    Abstract: The generalization of representations learned via contrastive learning depends crucially on what features of the data are extracted. However, we observe that the contrastive loss does not always sufficiently guide which features are extracted, a behavior that can negatively impact the performance on downstream tasks via "shortcuts", i.e., by inadvertently suppressing important predictive features.… ▽ More

    Submitted 19 December, 2021; v1 submitted 21 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021

  34. arXiv:2106.04186  [pdf, other

    cs.LG stat.ML

    What training reveals about neural network complexity

    Authors: Andreas Loukas, Marinos Poiitis, Stefanie Jegelka

    Abstract: This work explores the Benevolent Training Hypothesis (BTH) which argues that the complexity of the function a deep neural network (NN) is learning can be deduced by its training dynamics. Our analysis provides evidence for BTH by relating the NN's Lipschitz constant at different regions of the input space with the behavior of the stochastic training procedure. We first observe that the Lipschitz… ▽ More

    Submitted 29 October, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: Published at NeurIPS 2021

  35. arXiv:2106.03314  [pdf, other

    cs.LG stat.ML

    Measuring Generalization with Optimal Transport

    Authors: Ching-Yao Chuang, Youssef Mroueh, Kristjan Greenewald, Antonio Torralba, Stefanie Jegelka

    Abstract: Understanding the generalization of deep neural networks is one of the most important tasks in deep learning. Although much progress has been made, theoretical error bounds still often behave disparately from empirical observations. In this work, we develop margin-based generalization bounds, where the margins are normalized with optimal transport costs between independent random subsets sampled f… ▽ More

    Submitted 7 November, 2021; v1 submitted 6 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021

  36. arXiv:2105.04550  [pdf, other

    cs.LG cs.CV math.OC stat.ML

    Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

    Authors: Keyulu Xu, Mozhi Zhang, Stefanie Jegelka, Kenji Kawaguchi

    Abstract: Graph Neural Networks (GNNs) have been studied through the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear ra… ▽ More

    Submitted 26 May, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

  37. arXiv:2012.03174  [pdf, other

    cs.LG

    Counting Substructures with Higher-Order Graph Neural Networks: Possibility and Impossibility Results

    Authors: Behrooz Tahmasebi, Derek Lim, Stefanie Jegelka

    Abstract: While message passing Graph Neural Networks (GNNs) have become increasingly popular architectures for learning with graphs, recent works have revealed important shortcomings in their expressive power. In response, several higher-order GNNs have been proposed that substantially increase the expressive power, albeit at a large computational cost. Motivated by this gap, we explore alternative strateg… ▽ More

    Submitted 10 October, 2021; v1 submitted 5 December, 2020; originally announced December 2020.

    Comments: 26 pages, 4 figures

  38. arXiv:2010.04592  [pdf, other

    cs.LG stat.ML

    Contrastive Learning with Hard Negative Samples

    Authors: Joshua Robinson, Ching-Yao Chuang, Suvrit Sra, Stefanie Jegelka

    Abstract: How can you sample good negative examples for contrastive learning? We argue that, as with metric learning, contrastive learning of representations benefits from hard negative samples (i.e., points that are difficult to distinguish from an anchor point). The key challenge toward using hard negatives is that contrastive methods must remain unsupervised, making it infeasible to adopt existing negati… ▽ More

    Submitted 24 January, 2021; v1 submitted 9 October, 2020; originally announced October 2020.

    Comments: Published as a conference paper at ICLR 2021

  39. arXiv:2009.13504  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Information Obfuscation of Graph Neural Networks

    Authors: Peiyuan Liao, Han Zhao, Keyulu Xu, Tommi Jaakkola, Geoffrey Gordon, Stefanie Jegelka, Ruslan Salakhutdinov

    Abstract: While the advent of Graph Neural Networks (GNNs) has greatly improved node and graph representation learning in many applications, the neighborhood aggregation scheme exposes additional vulnerabilities to adversaries seeking to extract node-level information about sensitive attributes. In this paper, we study the problem of protecting sensitive attributes by information obfuscation when learning w… ▽ More

    Submitted 13 June, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

    Comments: ICML 2021; Code is available at https://github.com/liaopeiyuan/GAL

  40. arXiv:2009.11848  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

    Authors: Keyulu Xu, Mozhi Zhang, **gling Li, Simon S. Du, Ken-ichi Kawarabayashi, Stefanie Jegelka

    Abstract: We study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of the training distribution. Previous works report mixed empirical results when extrapolating with neural networks: while feedforward neural networks, a.k.a. multilayer perceptrons (MLPs), do not extrapolate well in certain simple tasks, Graph Neural Networks (GNNs) -- structured networ… ▽ More

    Submitted 2 March, 2021; v1 submitted 24 September, 2020; originally announced September 2020.

  41. arXiv:2008.03650  [pdf, ps, other

    cs.LG math.ST stat.ML

    Testing Determinantal Point Processes

    Authors: Khashayar Gatmiry, Maryam Aliakbarpour, Stefanie Jegelka

    Abstract: Determinantal point processes (DPPs) are popular probabilistic models of diversity. In this paper, we investigate DPPs from a new perspective: property testing of distributions. Given sample access to an unknown distribution $q$ over the subsets of a ground set, we aim to distinguish whether $q$ is a DPP distribution, or $ε$-far from all DPP distributions in $\ell_1$-distance. In this work, we pro… ▽ More

    Submitted 9 August, 2020; originally announced August 2020.

  42. arXiv:2007.03511  [pdf, other

    cs.LG stat.ML

    Estimating Generalization under Distribution Shifts via Domain-Invariant Representations

    Authors: Ching-Yao Chuang, Antonio Torralba, Stefanie Jegelka

    Abstract: When machine learning models are deployed on a test distribution different from the training distribution, they can perform poorly, but overestimate their performance. In this work, we aim to better estimate a model's performance under distribution shift, without supervision. To do so, we use a set of domain-invariant predictors as a proxy for the unknown, true target labels. Since the error of th… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

    Comments: arXiv admin note: text overlap with arXiv:1910.05804

    Journal ref: International Conference on Machine Learning, 2020

  43. arXiv:2007.00224  [pdf, other

    cs.LG stat.ML

    Debiased Contrastive Learning

    Authors: Ching-Yao Chuang, Joshua Robinson, Lin Yen-Chen, Antonio Torralba, Stefanie Jegelka

    Abstract: A prominent technique for self-supervised representation learning has been to contrast semantically similar and dissimilar pairs of samples. Without access to labels, dissimilar (negative) points are typically taken to be randomly sampled datapoints, implicitly accepting that these points may, in reality, actually have the same label. Perhaps unsurprisingly, we observe that sampling negative examp… ▽ More

    Submitted 21 October, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

    Journal ref: Advances in Neural Information Processing Systems (2020)

  44. arXiv:2006.06733  [pdf, other

    math.OC cs.LG

    IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method

    Authors: Yossi Arjevani, Joan Bruna, Bugra Can, Mert Gürbüzbalaban, Stefanie Jegelka, Hongzhou Lin

    Abstract: We introduce a framework for designing primal methods under the decentralized optimization setting where local functions are smooth and strongly convex. Our approach consists of approximately solving a sequence of sub-problems induced by the accelerated augmented Lagrangian method, thereby providing a systematic way for deriving several well-known decentralized algorithms including EXTRA arXiv:140… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

  45. arXiv:2002.09038  [pdf, other

    stat.ML cs.LG

    Distributionally Robust Bayesian Optimization

    Authors: Johannes Kirschner, Ilija Bogunovic, Stefanie Jegelka, Andreas Krause

    Abstract: Robustness to distributional shift is one of the key challenges of contemporary machine learning. Attaining such robustness is the goal of distributionally robust optimization, which seeks a solution to an optimization problem that is worst-case robust under a specified distributional shift of an uncontrolled covariate. In this paper, we study such a problem when the distributional shift is measur… ▽ More

    Submitted 22 March, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

    Comments: Accepted at AISTATS 2020

  46. arXiv:2002.08483  [pdf, other

    cs.LG stat.ML

    Strength from Weakness: Fast Learning Using Weak Supervision

    Authors: Joshua Robinson, Stefanie Jegelka, Suvrit Sra

    Abstract: We study generalization properties of weakly supervised learning. That is, learning where only a few "strong" labels (the actual target of our prediction) are present but many more "weak" labels are available. In particular, we show that having access to weak labels can significantly accelerate the learning rate for the strong task to the fast rate of $\mathcal{O}(\nicefrac1n)$, where $n$ denotes… ▽ More

    Submitted 19 February, 2020; originally announced February 2020.

    Comments: 21 pages, 8 figures

  47. arXiv:2002.06157  [pdf, ps, other

    cs.LG stat.ML

    Generalization and Representational Limits of Graph Neural Networks

    Authors: Vikas K. Garg, Stefanie Jegelka, Tommi Jaakkola

    Abstract: We address two fundamental questions about graph neural networks (GNNs). First, we prove that several important graph properties cannot be computed by GNNs that rely entirely on local information. Such GNNs include the standard message passing models, and more powerful spatial variants that exploit local graph structure (e.g., via relative orientation of messages, or local port ordering) to distin… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

  48. arXiv:2002.04130  [pdf, other

    math.OC cs.LG

    Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions

    Authors: **gzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Ali Jadbabaie, Suvrit Sra

    Abstract: We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonconvex functions. In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds. This class contains examples such as ReLU neural networks and others with non-differentiable activation functions. We fi… ▽ More

    Submitted 29 June, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

  49. arXiv:2002.03273  [pdf, ps, other

    cs.LG math.OC stat.ML

    On the Complexity of Minimizing Convex Finite Sums Without Using the Indices of the Individual Functions

    Authors: Yossi Arjevani, Amit Daniely, Stefanie Jegelka, Hongzhou Lin

    Abstract: Recent advances in randomized incremental methods for minimizing $L$-smooth $μ$-strongly convex finite sums have culminated in tight complexity of $\tilde{O}((n+\sqrt{n L/μ})\log(1/ε))$ and $O(n+\sqrt{nL/ε})$, where $μ>0$ and $μ=0$, respectively, and $n$ denotes the number of individual functions. Unlike incremental methods, stochastic methods for finite sums do not rely on an explicit knowledge o… ▽ More

    Submitted 8 February, 2020; originally announced February 2020.

  50. arXiv:1910.12511  [pdf, other

    cs.LG stat.ML

    Adaptive Sampling for Stochastic Risk-Averse Learning

    Authors: Sebastian Curi, Kfir. Y. Levy, Stefanie Jegelka, Andreas Krause

    Abstract: In high-stakes machine learning applications, it is crucial to not only perform well on average, but also when restricted to difficult examples. To address this, we consider the problem of training models in a risk-averse manner. We propose an adaptive sampling algorithm for stochastically optimizing the Conditional Value-at-Risk (CVaR) of a loss distribution, which measures its performance on the… ▽ More

    Submitted 6 November, 2020; v1 submitted 28 October, 2019; originally announced October 2019.