Skip to main content

Showing 1–14 of 14 results for author: Nica, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.00096  [pdf, other

    cs.LG cs.AI

    Bandit-Driven Batch Selection for Robust Learning under Label Noise

    Authors: Michal Lisicki, Mihai Nica, Graham W. Taylor

    Abstract: We introduce a novel approach for batch selection in Stochastic Gradient Descent (SGD) training, leveraging combinatorial bandit algorithms. Our methodology focuses on optimizing the learning process in the presence of label noise, a prevalent issue in real-world datasets. Experimental evaluations on the CIFAR-10 dataset reveal that our approach consistently outperforms existing methods across var… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

    Comments: WANT@NeurIPS 2023 & OPT@NeurIPS 2023

  2. arXiv:2310.12079  [pdf, other

    stat.ML cs.LG

    Differential Equation Scaling Limits of Shaped and Unshaped Neural Networks

    Authors: Mufan Bill Li, Mihai Nica

    Abstract: Recent analyses of neural networks with shaped activations (i.e. the activation function is scaled as the network size grows) have led to scaling limits described by differential equations. However, these results do not a priori tell us anything about "ordinary" unshaped networks, where the activation is unchanged as the network size grows. In this article, we find similar differential equation ba… ▽ More

    Submitted 18 April, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

  3. arXiv:2309.02530  [pdf, other

    cs.LG stat.ML

    Diffusion on the Probability Simplex

    Authors: Griffin Floto, Thorsteinn Jonsson, Mihai Nica, Scott Sanner, Eric Zhengyu Zhu

    Abstract: Diffusion models learn to reverse the progressive noising of a data distribution to create a generative model. However, the desired continuous nature of the noising process can be at odds with discrete data. To deal with this tension between continuous and discrete objects, we propose a method of performing diffusion on the probability simplex. Using the probability simplex naturally creates an in… ▽ More

    Submitted 11 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

  4. arXiv:2306.01513  [pdf, other

    cs.LG stat.ML

    Network Degeneracy as an Indicator of Training Performance: Comparing Finite and Infinite Width Angle Predictions

    Authors: Cameron Jakub, Mihai Nica

    Abstract: Neural networks are powerful functions with widespread use, but the theoretical behaviour of these functions is not fully understood. Creating deep neural networks by stacking many layers has achieved exceptional performance in many applications and contributed to the recent explosion of these methods. Previous works have shown that depth can exponentially increase the expressibility of the networ… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: 5 pages, comments welcome

  5. arXiv:2305.02299  [pdf, other

    cs.LG cs.CV

    Dynamic Sparse Training with Structured Sparsity

    Authors: Mike Lasby, Anna Golubeva, Utku Evci, Mihai Nica, Yani Ioannou

    Abstract: Dynamic Sparse Training (DST) methods achieve state-of-the-art results in sparse neural network training, matching the generalization of dense models while enabling sparse training and inference. Although the resulting models are highly sparse and theoretically less computationally expensive, achieving speedups with unstructured sparsity on real-world hardware is challenging. In this work, we prop… ▽ More

    Submitted 21 February, 2024; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: ICLR 2024, 29 pages, 22 figures

  6. arXiv:2302.09712  [pdf, other

    stat.ML cs.LG math.PR

    Depth Degeneracy in Neural Networks: Vanishing Angles in Fully Connected ReLU Networks on Initialization

    Authors: Cameron Jakub, Mihai Nica

    Abstract: Despite remarkable performance on a variety of tasks, many properties of deep neural networks are not yet theoretically understood. One such mystery is the depth degeneracy phenomenon: the deeper you make your network, the closer your network is to a constant function on initialization. In this paper, we examine the evolution of the angle between two inputs to a ReLU neural network as a function o… ▽ More

    Submitted 26 May, 2023; v1 submitted 19 February, 2023; originally announced February 2023.

    Comments: Minor updates and exposition improved. 37 pages, comments welcome

  7. arXiv:2207.09408  [pdf, other

    cs.LG cs.AI

    Bounding generalization error with input compression: An empirical study with infinite-width networks

    Authors: Angus Galloway, Anna Golubeva, Mahmoud Salem, Mihai Nica, Yani Ioannou, Graham W. Taylor

    Abstract: Estimating the Generalization Error (GE) of Deep Neural Networks (DNNs) is an important task that often relies on availability of held-out data. The ability to better predict GE based on a single training set may yield overarching DNN design principles to reduce a reliance on trial-and-error, along with other performance assessment advantages. In search of a quantity relevant to GE, we investigate… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: 12 pages main content, 26 pages total

  8. arXiv:2206.02768  [pdf, other

    stat.ML cs.LG

    The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization

    Authors: Mufan Bill Li, Mihai Nica, Daniel M. Roy

    Abstract: The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that sha** the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current inf… ▽ More

    Submitted 14 June, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: 48 pages, 10 figures. Advances in Neural Information Processing Systems (2022)

  9. arXiv:2111.15646  [pdf, other

    cs.LG cs.CV stat.ML

    The Exponentially Tilted Gaussian Prior for Variational Autoencoders

    Authors: Griffin Floto, Stefan Kremer, Mihai Nica

    Abstract: An important property for deep neural networks is the ability to perform robust out-of-distribution detection on previously unseen data. This property is essential for safety purposes when deploying models for real world applications. Recent studies show that probabilistic generative models can perform poorly on this task, which is surprising given that they seek to estimate the likelihood of trai… ▽ More

    Submitted 12 April, 2022; v1 submitted 30 November, 2021; originally announced November 2021.

  10. arXiv:2110.08664  [pdf, other

    cs.SE cs.AI eess.SY

    Finding Critical Scenarios for Automated Driving Systems: A Systematic Literature Review

    Authors: Xinhai Zhang, Jianbo Tao, Kaige Tan, Martin Törngren, José Manuel Gaspar Sánchez, Muhammad Rusyadi Ramli, Xin Tao, Magnus Gyllenhammar, Franz Wotawa, Naveen Mohan, Mihai Nica, Hermann Felbinger

    Abstract: Scenario-based approaches have been receiving a huge amount of attention in research and engineering of automated driving systems. Due to the complexity and uncertainty of the driving environment, and the complexity of the driving task itself, the number of possible driving scenarios that an ADS or ADAS may encounter is virtually infinite. Therefore it is essential to be able to reason about the i… ▽ More

    Submitted 16 October, 2021; originally announced October 2021.

    Comments: 37 pages, 24 figures

  11. arXiv:2106.04013  [pdf, other

    stat.ML cs.LG

    The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at Initialization

    Authors: Mufan Bill Li, Mihai Nica, Daniel M. Roy

    Abstract: Theoretical results show that neural networks can be approximated by Gaussian processes in the infinite-width limit. However, for fully connected networks, it has been previously shown that for any fixed network width, $n$, the Gaussian approximation gets worse as the network depth, $d$, increases. Given that modern networks are deep, this raises the question of how well modern architectures, like… ▽ More

    Submitted 27 October, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

  12. arXiv:2001.06145  [pdf, other

    cs.LG math.PR stat.ML

    A Derivative-Free Method for Solving Elliptic Partial Differential Equations with Deep Neural Networks

    Authors: Jihun Han, Mihai Nica, Adam R Stinchcombe

    Abstract: We introduce a deep neural network based method for solving a class of elliptic partial differential equations. We approximate the solution of the PDE with a deep neural network which is trained under the guidance of a probabilistic representation of the PDE in the spirit of the Feynman-Kac formula. The solution is given by an expectation of a martingale process driven by a Brownian motion. As Bro… ▽ More

    Submitted 16 January, 2020; originally announced January 2020.

    Comments: 25 pages, 4 figures

  13. arXiv:1909.05989  [pdf, other

    cs.LG math.PR stat.ML

    Finite Depth and Width Corrections to the Neural Tangent Kernel

    Authors: Boris Hanin, Mihai Nica

    Abstract: We prove the precise scaling, at finite depth and width, for the mean and variance of the neural tangent kernel (NTK) in a randomly initialized ReLU network. The standard deviation is exponential in the ratio of network depth to width. Thus, even in the limit of infinite overparameterization, the NTK is not deterministic if depth and width simultaneously tend to infinity. Moreover, we prove that f… ▽ More

    Submitted 12 September, 2019; originally announced September 2019.

    Comments: 27 pages, 2 figures, comments welcome

  14. arXiv:1509.03327  [pdf, other

    math.PR cs.GT math.OC

    Optimal Strategy in "Guess Who?": Beyond Binary Search

    Authors: Mihai Nica

    Abstract: "Guess Who?" is a popular two player game where players ask "Yes"/"No" questions to search for their opponent's secret identity from a pool of possible candidates. This is modeled as a simple stochastic game. Using this model, the optimal strategy is explicitly found. Contrary to popular belief, performing a binary search is \emph{not} always optimal. Instead, the optimal strategy for the player w… ▽ More

    Submitted 15 January, 2016; v1 submitted 8 September, 2015; originally announced September 2015.

    Comments: 13 pages, 2 figures. Derivation rewritten from the point of view of "Continuous Guess Who?". To appear in Probability in the Engineering and Informational Sciences

    MSC Class: 91A15; 60G40; 62L15; 91A60

    Journal ref: Prob. Eng. Inf. Sci. 30 (2016) 576-592