Skip to main content

Showing 1–16 of 16 results for author: Damian, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05529  [pdf, other

    cs.LG stat.ML

    Computational-Statistical Gaps in Gaussian Single-Index Models

    Authors: Alex Damian, Loucas Pillaud-Vivien, Jason D. Lee, Joan Bruna

    Abstract: Single-Index Models are high-dimensional regression problems with planted structure, whereby labels depend on an unknown one-dimensional projection of the input via a generic, non-linear, and potentially non-deterministic transformation. As such, they encompass a broad class of statistical inference tasks, and provide a rich template to study statistical and computational trade-offs in the high-di… ▽ More

    Submitted 12 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: 61 pages

  2. arXiv:2402.14735  [pdf, other

    cs.LG cs.IT stat.ML

    How Transformers Learn Causal Structure with Gradient Descent

    Authors: Eshaan Nichani, Alex Damian, Jason D. Lee

    Abstract: The incredible success of transformers on sequence modeling tasks can be largely attributed to the self-attention mechanism, which allows information to be transferred between different parts of a sequence. Self-attention allows transformers to encode causal structure which makes them particularly suitable for sequence modeling. However, the process by which transformers learn such causal structur… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  3. arXiv:2306.08708  [pdf, other

    cs.AI cs.DC cs.NI

    Naeural AI OS -- Decentralized ubiquitous computing MLOps execution engine

    Authors: Beatrice Milik, Stefan Saraev, Cristian Bleotiu, Radu Lupaescu, Bogdan Hobeanu, Andrei Ionut Damian

    Abstract: Over the past few years, ubiquitous, or pervasive computing has gained popularity as the primary approach for a wide range of applications, including enterprise-grade systems, consumer applications, and gaming systems. Ubiquitous computing refers to the integration of computing technologies into everyday objects and environments, creating a network of interconnected devices that can communicate wi… ▽ More

    Submitted 15 April, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: preprint

    ACM Class: I.2.5; I.2.11

  4. arXiv:2305.17333  [pdf, other

    cs.LG cs.CL

    Fine-Tuning Language Models with Just Forward Passes

    Authors: Sadhika Malladi, Tianyu Gao, Eshaan Nichani, Alex Damian, Jason D. Lee, Danqi Chen, Sanjeev Arora

    Abstract: Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory. Zeroth-order (ZO) methods can in principle estimate gradients using only two forward passes but are theorized to be catastrophically slow for optimizing large models. In this work, we propose a memory-efficient zerothorder opti… ▽ More

    Submitted 11 January, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted by NeurIPS 2023 (oral). Code available at https://github.com/princeton-nlp/MeZO

  5. arXiv:2305.10633  [pdf, other

    cs.LG cs.IT stat.ML

    Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models

    Authors: Alex Damian, Eshaan Nichani, Rong Ge, Jason D. Lee

    Abstract: We focus on the task of learning a single index model $σ(w^\star \cdot x)$ with respect to the isotropic Gaussian distribution in $d$ dimensions. Prior work has shown that the sample complexity of learning $w^\star$ is governed by the information exponent $k^\star$ of the link function $σ$, which is defined as the index of the first nonzero Hermite coefficient of $σ$. Ben Arous et al. (2021) showe… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

  6. arXiv:2305.06986  [pdf, other

    cs.LG stat.ML

    Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks

    Authors: Eshaan Nichani, Alex Damian, Jason D. Lee

    Abstract: One of the central questions in the theory of deep learning is to understand how neural networks learn hierarchical features. The ability of deep networks to extract salient features is crucial to both their outstanding generalization ability and the modern deep learning paradigm of pretraining and finetuneing. However, this feature learning process remains poorly understood from a theoretical per… ▽ More

    Submitted 31 October, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

    Comments: v2: NeurIPS 2023 camera ready

  7. arXiv:2209.15594  [pdf, other

    cs.LG cs.IT math.OC stat.ML

    Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability

    Authors: Alex Damian, Eshaan Nichani, Jason D. Lee

    Abstract: Traditional analyses of gradient descent show that when the largest eigenvalue of the Hessian, also known as the sharpness $S(θ)$, is bounded by $2/η$, training is "stable" and the training loss decreases monotonically. Recent works, however, have observed that this assumption does not hold when training modern neural networks with full batch or large batch gradient descent. Most recently, Cohen e… ▽ More

    Submitted 10 April, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: ICLR 2023, first two authors contributed equally

  8. arXiv:2206.15144  [pdf, other

    cs.LG cs.IT stat.ML

    Neural Networks can Learn Representations with Gradient Descent

    Authors: Alex Damian, Jason D. Lee, Mahdi Soltanolkotabi

    Abstract: Significant theoretical work has established that in specific regimes, neural networks trained by gradient descent behave like kernel methods. However, in practice, it is known that neural networks strongly outperform their associated kernels. In this work, we explain this gap by demonstrating that there is a large class of functions which cannot be efficiently learned by kernel methods but can be… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

    Comments: COLT 2022

  9. arXiv:2112.11925  [pdf, other

    cs.LG

    SOLIS -- The MLOps journey from data acquisition to actionable insights

    Authors: Razvan Ciobanu, Alexandru Purdila, Laurentiu Piciu, Andrei Damian

    Abstract: Machine Learning operations is unarguably a very important and also one of the hottest topics in Artificial Intelligence lately. Being able to define very clear hypotheses for actual real-life problems that can be addressed by machine learning models, collecting and curating large amounts of data for model training and validation followed by model architecture search and actual optimization and fi… ▽ More

    Submitted 28 January, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

  10. arXiv:2106.06530  [pdf, other

    cs.LG cs.IT math.OC stat.ML

    Label Noise SGD Provably Prefers Flat Global Minimizers

    Authors: Alex Damian, Tengyu Ma, Jason D. Lee

    Abstract: In overparametrized models, the noise in stochastic gradient descent (SGD) implicitly regularizes the optimization trajectory and determines which local minimum SGD converges to. Motivated by empirical studies that demonstrate that training with noisy labels improves generalization, we study the implicit regularization effect of SGD with label noise. We show that SGD with label noise converges to… ▽ More

    Submitted 4 December, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: 57 pages, 5 figures, NeurIPS 2021

  11. arXiv:2006.14994  [pdf

    cs.CL cs.LG

    ProVe -- Self-supervised pipeline for automated product replacement and cold-starting based on neural language models

    Authors: Andrei Ionut Damian, Laurentiu Piciu, Cosmin Mihai Marinescu

    Abstract: In retail vertical industries, businesses are dealing with human limitation of quickly understanding and adapting to new purchasing behaviors. Moreover, retail businesses need to overcome the human limitation of properly managing a massive selection of products/brands/categories. These limitations lead to deficiencies from both commercial (e.g. loss of sales, decrease in customer satisfaction) and… ▽ More

    Submitted 12 January, 2021; v1 submitted 26 June, 2020; originally announced June 2020.

  12. arXiv:2003.03808  [pdf, other

    cs.CV cs.LG eess.IV

    PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models

    Authors: Sachit Menon, Alexandru Damian, Shijia Hu, Nikhil Ravi, Cynthia Rudin

    Abstract: The primary aim of single-image super-resolution is to construct high-resolution (HR) images from corresponding low-resolution (LR) inputs. In previous approaches, which have generally been supervised, the training objective typically measures a pixel-wise average distance between the super-resolved (SR) and HR images. Optimizing such metrics often leads to blurring, especially in high variance (d… ▽ More

    Submitted 20 July, 2020; v1 submitted 8 March, 2020; originally announced March 2020.

    Comments: Sachit Menon and Alexandru Damian contributed equally. Computer Vision and Pattern Recognition (CVPR) 2020

  13. arXiv:1911.01346  [pdf

    cs.CV cs.LG eess.IV

    CloudifierNet -- Deep Vision Models for Artificial Image Processing

    Authors: Andrei Damian, Laurentiu Piciu, Alexandru Purdila, Nicolae Tapus

    Abstract: Today, more and more, it is necessary that most applications and documents developed in previous or current technologies to be accessible online on cloud-based infrastructures. That is why the migration of legacy systems including their hosts of documents to new technologies and online infrastructures, using modern Artificial Intelligence techniques, is absolutely necessary. With the advancement o… ▽ More

    Submitted 28 July, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

    Comments: ITQM 2019

  14. arXiv:1904.07687  [pdf

    cs.IR cs.LG

    Advanced Customer Activity Prediction based on Deep Hierarchic Encoder-Decoders

    Authors: Andrei Damian, Laurentiu Piciu, Sergiu Turlea, Nicolae Tapus

    Abstract: Product recommender systems and customer profiling techniques have always been a priority in online retail. Recent machine learning research advances and also wide availability of massive parallel numerical computing has enabled various approaches and directions of recommender systems advancement. Worth to mention is the fact that in past years multiple traditional "offline" retail business are ge… ▽ More

    Submitted 21 June, 2019; v1 submitted 11 April, 2019; originally announced April 2019.

    Comments: 2019 22nd International Conference on Control Systems and Computer Science (CSCS)

    ACM Class: I.2.4

  15. arXiv:1903.09942  [pdf

    cs.IR cs.AI cs.CL cs.LG

    Deep recommender engine based on efficient product embeddings neural pipeline

    Authors: Laurentiu Piciu, Andrei Damian, Nicolae Tapus, Andrei Simion-Constantinescu, Bogdan Dumitrescu

    Abstract: Predictive analytics systems are currently one of the most important areas of research and development within the Artificial Intelligence domain and particularly in Machine Learning. One of the "holy grails" of predictive analytics is the research and development of the "perfect" recommendation system. In our paper, we propose an advanced pipeline model for the multi-task objective of determining… ▽ More

    Submitted 22 July, 2019; v1 submitted 24 March, 2019; originally announced March 2019.

    Comments: 2018 17th RoEduNet Conference: Networking in Education and Research (RoEduNet)

  16. arXiv:1805.03383  [pdf, other

    cs.CV

    New Techniques for Preserving Global Structure and Denoising with Low Information Loss in Single-Image Super-Resolution

    Authors: Yijie Bei, Alex Damian, Shijia Hu, Sachit Menon, Nikhil Ravi, Cynthia Rudin

    Abstract: This work identifies and addresses two important technical challenges in single-image super-resolution: (1) how to upsample an image without magnifying noise and (2) how to preserve large scale structure when upsampling. We summarize the techniques we developed for our second place entry in Track 1 (Bicubic Downsampling), seventh place entry in Track 2 (Realistic Adverse Conditions), and seventh p… ▽ More

    Submitted 15 June, 2018; v1 submitted 9 May, 2018; originally announced May 2018.

    Comments: 8 pages, CVPR workshop 2018