Skip to main content

Showing 1–18 of 18 results for author: Baldassi, C

Searching in archive stat. Search in all archives.
.
  1. arXiv:2202.03949  [pdf, other

    cs.LG stat.ML

    Systematically and efficiently improving $k$-means initialization by pairwise-nearest-neighbor smoothing

    Authors: Carlo Baldassi

    Abstract: We present a meta-method for initializing (seeding) the $k$-means clustering algorithm called PNN-smoothing. It consists in splitting a given dataset into $J$ random subsets, clustering each of them individually, and merging the resulting clusterings with the pairwise-nearest-neighbor (PNN) method. It is a meta-method in the sense that when clustering the individual subsets any seeding algorithm c… ▽ More

    Submitted 9 December, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

    Comments: https://openreview.net/forum?id=FTtFAg3pek 16 pages (+8 appendix), 2 figures, 4 tables (+14 appendix). Transactions on Machine Learning Research, Dec 2022

  2. arXiv:2110.00683  [pdf, other

    cs.LG cond-mat.dis-nn math.PR stat.ML

    Learning through atypical "phase transitions" in overparameterized neural networks

    Authors: Carlo Baldassi, Clarissa Lauditi, Enrico M. Malatesta, Rosalba Pacelli, Gabriele Perugini, Riccardo Zecchina

    Abstract: Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of prediction accuracy without overfitting. These are formidable results that defy predictions of statistical learning and pose conceptual challenges for non-convex optimi… ▽ More

    Submitted 11 June, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

    Comments: 28 pages, 14 figures

  3. arXiv:2008.00234  [pdf

    cs.AI econ.TH math.PR stat.ML

    Ergodic Annealing

    Authors: Carlo Baldassi, Fabio Maccheroni, Massimo Marinacci, Marco Pirazzini

    Abstract: Simulated Annealing is the crowning glory of Markov Chain Monte Carlo Methods for the solution of NP-hard optimization problems in which the cost function is known. Here, by replacing the Metropolis engine of Simulated Annealing with a reinforcement learning variation -- that we call Macau Algorithm -- we show that the Simulated Annealing heuristic can be very effective also when the cost function… ▽ More

    Submitted 1 August, 2020; originally announced August 2020.

  4. arXiv:2006.07897  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Entropic gradient descent algorithms and wide flat minima

    Authors: Fabrizio Pittorino, Carlo Lucibello, Christoph Feinauer, Gabriele Perugini, Carlo Baldassi, Elizaveta Demyanenko, Riccardo Zecchina

    Abstract: The properties of flat minima in the empirical risk landscape of neural networks have been debated for some time. Increasing evidence suggests they possess better generalization capabilities with respect to sharp ones. First, we discuss Gaussian mixture classification models and show analytically that there exist Bayes optimal pointwise estimators which correspond to minimizers belonging to wide f… ▽ More

    Submitted 15 November, 2021; v1 submitted 14 June, 2020; originally announced June 2020.

    Comments: ICLR 2021 camera-ready

  5. arXiv:1911.06756  [pdf, other

    cond-mat.dis-nn stat.ML

    Clustering of solutions in the symmetric binary perceptron

    Authors: Carlo Baldassi, Riccardo Della Vecchia, Carlo Lucibello, Riccardo Zecchina

    Abstract: The geometrical features of the (non-convex) loss landscape of neural network models are crucial in ensuring successful optimization and, most importantly, the capability to generalize well. While minimizers' flatness consistently correlates with good generalization, there has been little rigorous work in exploring the condition of existence of such minimizers, even in toy models. Here we consider… ▽ More

    Submitted 11 May, 2020; v1 submitted 15 November, 2019; originally announced November 2019.

    Journal ref: J. Stat. Mech. (2020) 073303

  6. arXiv:1909.13327  [pdf, other

    q-bio.QM cs.LG stat.ML

    Natural representation of composite data with replicated autoencoders

    Authors: Matteo Negri, Davide Bergamini, Carlo Baldassi, Riccardo Zecchina, Christoph Feinauer

    Abstract: Generative processes in biology and other fields often produce data that can be regarded as resulting from a composition of basic features. Here we present an unsupervised method based on autoencoders for inferring these basic features of data. The main novelty in our approach is that the training is based on the optimization of the `local entropy' rather than the standard loss, resulting in a mor… ▽ More

    Submitted 29 September, 2019; originally announced September 2019.

    Comments: 11 pages, 4 figures

  7. arXiv:1907.07578  [pdf, other

    cond-mat.dis-nn cs.LG stat.ML

    Properties of the geometry of solutions and capacity of multi-layer neural networks with Rectified Linear Units activations

    Authors: Carlo Baldassi, Enrico M. Malatesta, Riccardo Zecchina

    Abstract: Rectified Linear Units (ReLU) have become the main model for the neural units in current deep learning systems. This choice has been originally suggested as a way to compensate for the so called vanishing gradient problem which can undercut stochastic gradient descent (SGD) learning in networks composed of multiple layers. Here we provide analytical results on the effects of ReLUs on the capacity… ▽ More

    Submitted 3 May, 2024; v1 submitted 17 July, 2019; originally announced July 2019.

    Comments: 11 pages, 3 figures

    Journal ref: Phys. Rev. Lett. 123, 170602 (2019)

  8. arXiv:1905.07833  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Sha** the learning landscape in neural networks around wide flat minima

    Authors: Carlo Baldassi, Fabrizio Pittorino, Riccardo Zecchina

    Abstract: Learning in Deep Neural Networks (DNN) takes place by minimizing a non-convex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points, and that such minimizers are often satisfactory at avoiding overfitting. How these two features can be kept under c… ▽ More

    Submitted 11 March, 2020; v1 submitted 19 May, 2019; originally announced May 2019.

    Comments: 37 pages (16 main text), 10 figures (7 main text)

    Journal ref: Proceedings of the National Academy of Sciences, 2020 Jan 7, 117 (1) 161-170

  9. Recombinator-k-means: An evolutionary algorithm that exploits k-means++ for recombination

    Authors: Carlo Baldassi

    Abstract: We introduce an evolutionary algorithm called recombinator-$k$-means for optimizing the highly non-convex kmeans problem. Its defining feature is that its crossover step involves all the members of the current generation, stochastically recombining them with a repurposed variant of the $k$-means++ seeding algorithm. The recombination also uses a reweighting mechanism that realizes a progressively… ▽ More

    Submitted 14 January, 2022; v1 submitted 1 May, 2019; originally announced May 2019.

    Comments: 18 pages, 5 figures (1 in main text), 7 tables (5 in main text)

  10. arXiv:1710.09825  [pdf, other

    cond-mat.dis-nn cs.LG cs.NE stat.ML

    On the role of synaptic stochasticity in training low-precision neural networks

    Authors: Carlo Baldassi, Federica Gerace, Hilbert J. Kappen, Carlo Lucibello, Luca Saglietti, Enzo Tartaglione, Riccardo Zecchina

    Abstract: Stochasticity and limited precision of synaptic weights in neural network models are key aspects of both biological and hardware modeling of learning processes. Here we show that a neural network model with stochastic binary weights naturally gives prominence to exponentially rare dense regions of solutions with a number of desirable properties such as robustness and good generalization performanc… ▽ More

    Submitted 19 March, 2018; v1 submitted 26 October, 2017; originally announced October 2017.

    Comments: 7 pages + 14 pages of supplementary material

    Journal ref: Phys. Rev. Lett. 120, 268103 (2018)

  11. arXiv:1707.00424  [pdf, other

    cs.LG cs.DC stat.ML

    Parle: parallelizing stochastic gradient descent

    Authors: Pratik Chaudhari, Carlo Baldassi, Riccardo Zecchina, Stefano Soatto, Ameet Talwalkar, Adam Oberman

    Abstract: We propose a new algorithm called Parle for parallel training of deep networks that converges 2-4x faster than a data-parallel implementation of SGD, while achieving significantly improved error rates that are nearly state-of-the-art on several benchmarks including CIFAR-10 and CIFAR-100, without introducing any additional hyper-parameters. We exploit the phenomenon of flat minima that has been sh… ▽ More

    Submitted 10 September, 2017; v1 submitted 3 July, 2017; originally announced July 2017.

  12. arXiv:1706.08470  [pdf, other

    quant-ph cond-mat.dis-nn cs.LG stat.ML

    Efficiency of quantum versus classical annealing in non-convex learning problems

    Authors: Carlo Baldassi, Riccardo Zecchina

    Abstract: Quantum annealers aim at solving non-convex optimization problems by exploiting cooperative tunneling effects to escape local minima. The underlying idea consists in designing a classical energy function whose ground states are the sought optimal solutions of the original optimization problem and add a controllable quantum transverse field to generate tunneling processes. A key challenge is to ide… ▽ More

    Submitted 16 October, 2017; v1 submitted 26 June, 2017; originally announced June 2017.

    Comments: 31 pages, 10 figures

    Journal ref: Proceedings of the National Academy of Sciences Jan 2018, 201711456

  13. arXiv:1611.01838  [pdf, other

    cs.LG stat.ML

    Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

    Authors: Pratik Chaudhari, Anna Choromanska, Stefano Soatto, Yann LeCun, Carlo Baldassi, Christian Borgs, Jennifer Chayes, Levent Sagun, Riccardo Zecchina

    Abstract: This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We leverage upon this observation to construct a local-entropy-based object… ▽ More

    Submitted 21 April, 2017; v1 submitted 6 November, 2016; originally announced November 2016.

    Comments: ICLR '17

  14. arXiv:1605.06444  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Unreasonable Effectiveness of Learning Neural Networks: From Accessible States and Robust Ensembles to Basic Algorithmic Schemes

    Authors: Carlo Baldassi, Christian Borgs, Jennifer Chayes, Alessandro Ingrosso, Carlo Lucibello, Luca Saglietti, Riccardo Zecchina

    Abstract: In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost-function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here w… ▽ More

    Submitted 6 October, 2016; v1 submitted 20 May, 2016; originally announced May 2016.

    Comments: 31 pages (14 main text, 18 appendix), 12 figures (6 main text, 6 appendix)

    Journal ref: Proc. Natl. Acad. Sci. U.S.A. 113(48):E7655-E7662, 2016

  15. arXiv:1602.04129  [pdf, ps, other

    cond-mat.dis-nn q-bio.NC stat.ML

    Learning may need only a few bits of synaptic precision

    Authors: Carlo Baldassi, Federica Gerace, Carlo Lucibello, Luca Saglietti, Riccardo Zecchina

    Abstract: Learning in neural networks poses peculiar challenges when using discretized rather then continuous synaptic states. The choice of discrete synapses is motivated by biological reasoning and experiments, and possibly by hardware implementation considerations as well. In this paper we extend a previous large deviations analysis which unveiled the existence of peculiar dense regions in the space of s… ▽ More

    Submitted 27 May, 2016; v1 submitted 12 February, 2016; originally announced February 2016.

    Comments: 38 pages (main text: 16 pages), 5 figures; http://link.aps.org/doi/10.1103/PhysRevE.93.052313

    Journal ref: Phys. Rev. E 93, 052313 (2016)

  16. arXiv:1511.05634  [pdf, ps, other

    cond-mat.dis-nn stat.ML

    Local entropy as a measure for sampling solutions in Constraint Satisfaction Problems

    Authors: Carlo Baldassi, Alessandro Ingrosso, Carlo Lucibello, Luca Saglietti, Riccardo Zecchina

    Abstract: We introduce a novel Entropy-driven Monte Carlo (EdMC) strategy to efficiently sample solutions of random Constraint Satisfaction Problems (CSPs). First, we extend a recent result that, using a large-deviation analysis, shows that the geometry of the space of solutions of the Binary Perceptron Learning Problem (a prototypical CSP), contains regions of very high-density of solutions. Despite being… ▽ More

    Submitted 25 February, 2016; v1 submitted 17 November, 2015; originally announced November 2015.

    Comments: 46 pages (main text: 22), 7 figures. This is an author-created, un-copyedited version of an article published in Journal of Statistical Mechanics: Theory and Experiment. IOP Publishing Ltd is not responsible for any errors or omissions in this version of the manuscript or any version derived from it. The Version of Record is available online at http://dx.doi.org/10.1088/1742-5468/2016/02/023301

    ACM Class: G.1.6; I.2.M

    Journal ref: J. Stat. Mech. 2016 (2) 023301

  17. arXiv:1509.05753  [pdf, other

    cond-mat.dis-nn q-bio.NC stat.ML

    Subdominant Dense Clusters Allow for Simple Learning and High Computational Performance in Neural Networks with Discrete Synapses

    Authors: Carlo Baldassi, Alessandro Ingrosso, Carlo Lucibello, Luca Saglietti, Riccardo Zecchina

    Abstract: We show that discrete synaptic weights can be efficiently used for learning in large scale neural systems, and lead to unanticipated computational performance. We focus on the representative case of learning random patterns with binary synapses in single layer networks. The standard statistical analysis shows that this problem is exponentially dominated by isolated solutions that are extremely har… ▽ More

    Submitted 18 September, 2015; originally announced September 2015.

    Comments: 11 pages, 4 figures (main text: 5 pages, 3 figures; Supplemental Material: 6 pages, 1 figure)

    Journal ref: Physical Review Letters, 15, 128101 (2015) url=http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.115.128101

  18. arXiv:1404.1240  [pdf, other

    q-bio.QM cond-mat.dis-nn stat.ME

    Fast and accurate multivariate Gaussian modeling of protein families: Predicting residue contacts and protein-interaction partners

    Authors: Carlo Baldassi, Marco Zamparo, Christoph Feinauer, Andrea Procaccini, Riccardo Zecchina, Martin Weigt, Andrea Pagnani

    Abstract: In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence inform… ▽ More

    Submitted 4 April, 2014; originally announced April 2014.

    Comments: 24 pages, 7 pdf figures, 2 tables, plus supporting informations. Published on PLOS ONE

    Journal ref: PLoS ONE 9(3): e92721