Skip to main content

Showing 1–10 of 10 results for author: Tsilivis, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.04981  [pdf, other

    cs.LG stat.ML

    The Price of Implicit Bias in Adversarially Robust Generalization

    Authors: Nikolaos Tsilivis, Natalie Frank, Nathan Srebro, Julia Kempe

    Abstract: We study the implicit bias of optimization in robust empirical risk minimization (robust ERM) and its connection with robust generalization. In classification settings under adversarial perturbations with linear models, we study what type of regularization should ideally be applied for a given perturbation set to improve (robust) generalization. We then show that the implicit bias of optimization… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  2. arXiv:2404.19640  [pdf, other

    cs.LG cs.AI cs.CV stat.ME stat.ML

    Attacking Bayes: On the Adversarial Robustness of Bayesian Neural Networks

    Authors: Yunzhen Feng, Tim G. J. Rudner, Nikolaos Tsilivis, Julia Kempe

    Abstract: Adversarial examples have been shown to cause neural networks to fail on a wide range of vision and language tasks, but recent work has claimed that Bayesian neural networks (BNNs) are inherently robust to adversarial perturbations. In this work, we examine this claim. To study the adversarial robustness of BNNs, we investigate whether it is possible to successfully break state-of-the-art BNN infe… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  3. arXiv:2402.11004  [pdf, other

    cs.LG

    The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains

    Authors: Benjamin L. Edelman, Ezra Edelman, Surbhi Goel, Eran Malach, Nikolaos Tsilivis

    Abstract: Large language models have the ability to generate text that mimics patterns in their inputs. We introduce a simple Markov Chain sequence modeling task in order to study how this in-context learning (ICL) capability emerges. In our setting, each example is sampled from a Markov chain drawn from a prior distribution over Markov chains. Transformers trained on this task form \emph{statistical induct… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  4. arXiv:2311.07444  [pdf, other

    cs.LG

    On the Robustness of Neural Collapse and the Neural Collapse of Robustness

    Authors: **gtong Su, Ya Shi Zhang, Nikolaos Tsilivis, Julia Kempe

    Abstract: Neural Collapse refers to the curious phenomenon in the end of training of a neural network, where feature vectors and classification weights converge to a very simple geometrical arrangement (a simplex). While it has been observed empirically in various cases and has been theoretically motivated, its connection with crucial properties of neural networks, like their generalization and robustness,… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  5. arXiv:2307.02693  [pdf, other

    cs.LG stat.ML

    Kernels, Data & Physics

    Authors: Francesco Cagnetta, Deborah Oliveira, Mahalakshmi Sabanayagam, Nikolaos Tsilivis, Julia Kempe

    Abstract: Lecture notes from the course given by Professor Julia Kempe at the summer school "Statistical physics of Machine Learning" in Les Houches. The notes discuss the so-called NTK approach to problems in machine learning, which consists of gaining an understanding of generally unsolvable problems by finding a tractable kernel formulation. The notes are mainly focused on practical applications such as… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Comments: These are notes from the lecture of Julia Kempe given at the summer school "Statistical Physics \& Machine Learning", that took place in Les Houches School of Physics in France from 4th to 29th July 2022

  6. arXiv:2303.11873  [pdf, other

    cs.LG

    A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks

    Authors: William Merrill, Nikolaos Tsilivis, Aman Shukla

    Abstract: Grokking is a phenomenon where a model trained on an algorithmic task first overfits but, then, after a large amount of additional training, undergoes a phase transition to generalize perfectly. We empirically study the internal structure of networks undergoing grokking on the sparse parity task, and find that the grokking phase transition corresponds to the emergence of a sparse subnetwork that d… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: Published at the Workshop on Understanding Foundation Models at ICLR 2023

  7. arXiv:2210.05577  [pdf, other

    cs.LG cs.CR

    What Can the Neural Tangent Kernel Tell Us About Adversarial Robustness?

    Authors: Nikolaos Tsilivis, Julia Kempe

    Abstract: The adversarial vulnerability of neural nets, and subsequent techniques to create robust models have attracted significant attention; yet we still lack a full understanding of this phenomenon. Here, we study adversarial examples of trained neural networks through analytical tools afforded by recent theory advances connecting neural networks and kernel methods, namely the Neural Tangent Kernel (NTK… ▽ More

    Submitted 30 January, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022; added link to GitHub repository

  8. arXiv:2207.11727  [pdf, other

    cs.LG cs.CV

    Can we achieve robustness from data alone?

    Authors: Nikolaos Tsilivis, **gtong Su, Julia Kempe

    Abstract: We introduce a meta-learning algorithm for adversarially robust classification. The proposed method tries to be as model agnostic as possible and optimizes a dataset prior to its deployment in a machine learning system, aiming to effectively erase its non-robust features. Once the dataset has been created, in principle no specialized algorithm (besides standard gradient descent) is needed to train… ▽ More

    Submitted 30 January, 2023; v1 submitted 24 July, 2022; originally announced July 2022.

  9. arXiv:2201.12451  [pdf, other

    cs.LG

    Extracting Finite Automata from RNNs Using State Merging

    Authors: William Merrill, Nikolaos Tsilivis

    Abstract: One way to interpret the behavior of a blackbox recurrent neural network (RNN) is to extract from it a more interpretable discrete computational model, like a finite state machine, that captures its behavior. In this work, we propose a new method for extracting finite automata from RNNs inspired by the state merging paradigm from grammatical inference. We demonstrate the effectiveness of our metho… ▽ More

    Submitted 13 April, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: Preprint

  10. arXiv:2011.04468  [pdf, other

    math.OC cs.LG math.RA stat.ML

    Sparse Approximate Solutions to Max-Plus Equations with Application to Multivariate Convex Regression

    Authors: Nikos Tsilivis, Anastasios Tsiamis, Petros Maragos

    Abstract: In this work, we study the problem of finding approximate, with minimum support set, solutions to matrix max-plus equations, which we call sparse approximate solutions. We show how one can obtain such solutions efficiently and in polynomial time for any $\ell_p$ approximation error. Based on these results, we propose a novel method for piecewise-linear fitting of convex multivariate functions, wit… ▽ More

    Submitted 21 December, 2020; v1 submitted 6 November, 2020; originally announced November 2020.

    Comments: 20 pages, 5 figures, 5 tables. Introduction revision and typos correction