Skip to main content

Showing 1–50 of 77 results for author: Oymak, S

.
  1. arXiv:2404.13082  [pdf, other

    cs.CL cs.AI cs.LG

    TREACLE: Thrifty Reasoning via Context-Aware LLM and Prompt Selection

    Authors: Xuechen Zhang, Zijian Huang, Ege Onur Taga, Carlee Joe-Wong, Samet Oymak, Jiasi Chen

    Abstract: Recent successes in natural language processing have led to the proliferation of large language models (LLMs) by multiple providers. Each LLM offering has different inference accuracy, monetary cost, and latency, and their accuracy further depends on the exact wording of the question (i.e., the specific prompt). At the same time, users often have a limit on monetary budget and latency to answer al… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  2. arXiv:2403.08081  [pdf, other

    cs.LG cs.AI cs.CL math.OC

    Mechanics of Next Token Prediction with Self-Attention

    Authors: Yingcong Li, Yixiao Huang, M. Emrullah Ildiz, Ankit Singh Rawat, Samet Oymak

    Abstract: Transformer-based language models are trained on large datasets to predict the next token given an input sequence. Despite this simple training objective, they have led to revolutionary advances in natural language processing. Underlying this success is the self-attention mechanism. In this work, we ask: $\textit{What}$ $\textit{does}$ $\textit{a}$ $\textit{single}$ $\textit{self-attention}$… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: Accepted to AISTATS 2024

  3. arXiv:2402.13512  [pdf, other

    cs.LG cs.AI cs.CL

    From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers

    Authors: M. Emrullah Ildiz, Yixiao Huang, Yingcong Li, Ankit Singh Rawat, Samet Oymak

    Abstract: Modern language models rely on the transformer architecture and attention mechanism to perform language understanding and text generation. In this work, we study learning a 1-layer self-attention model from a set of prompts and associated output data sampled from the model. We first establish a precise map** between the self-attention mechanism and Markov models: Inputting a prompt to the model… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 30 pages

  4. arXiv:2402.08769  [pdf, other

    cs.LG cs.DC

    FLASH: Federated Learning Across Simultaneous Heterogeneities

    Authors: Xiangyu Chang, Sk Miraj Ahmed, Srikanth V. Krishnamurthy, Basak Guler, Ananthram Swami, Samet Oymak, Amit K. Roy-Chowdhury

    Abstract: The key premise of federated learning (FL) is to train ML models across a diverse set of data-owners (clients), without exchanging local data. An overarching challenge to this date is client heterogeneity, which may arise not only from variations in data distribution, but also in data quality, as well as compute/communication latency. An integrated view of these diverse and concurrent sources of h… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  5. arXiv:2402.04248  [pdf, other

    cs.LG

    Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

    Authors: Jongho Park, Jaeseung Park, Zheyang Xiong, Nayoung Lee, Jaewoong Cho, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos

    Abstract: State-space models (SSMs), such as Mamba (Gu & Dao, 2023), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and input-dependent token selection to mitigate the quadratic cost of multi-head attention. Although SSMs exhibit competitive performance, their in-context learning (ICL) capabilities, a remarkable emergent property of mo… ▽ More

    Submitted 25 April, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Changes in v2: experiments on formal language ICL and explorations of width vs. depth on ICL; code repo available (24 pages, 10 figures)

  6. arXiv:2401.14343  [pdf, other

    cs.LG cs.CY stat.ML

    Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective

    Authors: Xuechen Zhang, Mingchen Li, Jiasi Chen, Christos Thrampoulidis, Samet Oymak

    Abstract: Modern classification problems exhibit heterogeneities across individual classes: Each class may have unique attributes, such as sample size, label quality, or predictability (easy vs difficult), and variable importance at test-time. Without care, these heterogeneities impede the learning process, most notably, when optimizing fairness objectives. Confirming this, under a gaussian mixture setting,… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: 15 pages, 8 figures

  7. arXiv:2401.04130  [pdf, other

    cs.LG cs.AI

    Plug-and-Play Transformer Modules for Test-Time Adaptation

    Authors: Xiangyu Chang, Sk Miraj Ahmed, Srikanth V. Krishnamurthy, Basak Guler, Ananthram Swami, Samet Oymak, Amit K. Roy-Chowdhury

    Abstract: Parameter-efficient tuning (PET) methods such as LoRA, Adapter, and Visual Prompt Tuning (VPT) have found success in enabling adaptation to new domains by tuning small modules within a transformer model. However, the number of domains encountered during test time can be very large, and the data is usually unlabeled. Thus, adaptation to new domains is challenging; it is also impractical to generate… ▽ More

    Submitted 8 February, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

  8. arXiv:2401.02561  [pdf, other

    cs.LG

    MeTA: Multi-source Test Time Adaptation

    Authors: Sk Miraj Ahmed, Fahim Faisal Niloy, Dripta S. Raychaudhuri, Samet Oymak, Amit K. Roy-Chowdhury

    Abstract: Test time adaptation is the process of adapting, in an unsupervised manner, a pre-trained source model to each incoming batch of the test data (i.e., without requiring a substantial portion of the test data to be available, as in traditional domain adaptation) and without access to the source data. Since it works with each batch of test data, it is well-suited for dynamic environments where decisi… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: Under Review

  9. arXiv:2312.07851  [pdf, ps, other

    cs.LG eess.SY math.OC

    Noise in the reverse process improves the approximation capabilities of diffusion models

    Authors: Karthik Elamvazhuthi, Samet Oymak, Fabio Pasqualetti

    Abstract: In Score based Generative Modeling (SGMs), the state-of-the-art in generative modeling, stochastic reverse processes are known to perform better than their deterministic counterparts. This paper delves into the heart of this phenomenon, comparing neural ordinary differential equations (ODEs) and neural stochastic differential equations (SDEs) as reverse processes. We use a control theoretic perspe… ▽ More

    Submitted 13 December, 2023; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Extended preprint for submission to Learning for Dynamics & Control Conference

  10. arXiv:2311.04991  [pdf, other

    cs.LG cs.CV

    Effective Restoration of Source Knowledge in Continual Test Time Adaptation

    Authors: Fahim Faisal Niloy, Sk Miraj Ahmed, Dripta S. Raychaudhuri, Samet Oymak, Amit K. Roy-Chowdhury

    Abstract: Traditional test-time adaptation (TTA) methods face significant challenges in adapting to dynamic environments characterized by continuously changing long-term target distributions. These challenges primarily stem from two factors: catastrophic forgetting of previously learned valuable source knowledge and gradual error accumulation caused by miscalibrated pseudo labels. To address these issues, t… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: WACV 2024

  11. arXiv:2308.16898  [pdf, other

    cs.LG cs.AI cs.CL math.OC

    Transformers as Support Vector Machines

    Authors: Davoud Ataee Tarzanagh, Yingcong Li, Christos Thrampoulidis, Samet Oymak

    Abstract: Since its inception in "Attention Is All You Need", transformer architecture has led to revolutionary advancements in NLP. The attention layer within the transformer admits a sequence of input tokens $X$ and makes them interact through pairwise similarities computed as softmax$(XQK^\top X^\top)$, where $(K,Q)$ are the trainable key-query parameters. In this work, we establish a formal equivalence… ▽ More

    Submitted 22 February, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: The proof of global convergence for gradient descent in the equal score setting has been fixed, referring to Theorem 2 of [TLZO23], and the experimental results have been extended

  12. arXiv:2308.08536  [pdf, other

    eess.SY cs.AI cs.LG

    Can Transformers Learn Optimal Filtering for Unknown Systems?

    Authors: Haldun Balim, Zhe Du, Samet Oymak, Necmiye Ozay

    Abstract: Transformer models have shown great success in natural language processing; however, their potential remains mostly unexplored for dynamical systems. In this work, we investigate the optimal output estimation problem using transformers, which generate output predictions using all the past ones. Particularly, we train the transformer using various distinct systems and then evaluate the performance… ▽ More

    Submitted 11 June, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: Minor differences between the implementation and the originally provided descriptions are corrected, ensuring better clarity and accuracy of the content

  13. arXiv:2307.04905  [pdf, other

    cs.LG cs.DC

    FedYolo: Augmenting Federated Learning with Pretrained Transformers

    Authors: Xuechen Zhang, Mingchen Li, Xiangyu Chang, Jiasi Chen, Amit K. Roy-Chowdhury, Ananda Theertha Suresh, Samet Oymak

    Abstract: The growth and diversity of machine learning applications motivate a rethinking of learning with mobile and edge devices. How can we address diverse client goals and learn with scarce heterogeneous data? While federated learning aims to address these issues, it has challenges hindering a unified solution. Large transformer models have been shown to work across a variety of tasks achieving remarkab… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: 20 pages, 18 figures

  14. arXiv:2306.13596  [pdf, other

    cs.LG cs.AI cs.CL math.OC

    Max-Margin Token Selection in Attention Mechanism

    Authors: Davoud Ataee Tarzanagh, Yingcong Li, Xuechen Zhang, Samet Oymak

    Abstract: Attention mechanism is a central component of the transformer architecture which led to the phenomenal success of large language models. However, the theoretical principles underlying the attention mechanism are poorly understood, especially its nonconvex optimization dynamics. In this work, we explore the seminal softmax-attention model… ▽ More

    Submitted 8 December, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

    Comments: Revised proof of Theorem 2 - Gradient descent path globally converges only when n=1

  15. arXiv:2306.03435  [pdf, other

    cs.LG cs.CL stat.ML

    On the Role of Attention in Prompt-tuning

    Authors: Samet Oymak, Ankit Singh Rawat, Mahdi Soltanolkotabi, Christos Thrampoulidis

    Abstract: Prompt-tuning is an emerging strategy to adapt large language models (LLM) to downstream tasks by learning a (soft-)prompt parameter from data. Despite its success in LLMs, there is limited theoretical understanding of the power of prompt-tuning and the role of the attention mechanism in prompting. In this work, we explore prompt-tuning for one-layer attention architectures and study contextual mi… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: Published at ICML 2023

  16. arXiv:2306.01648  [pdf, other

    cs.LG cs.DC

    Federated Multi-Sequence Stochastic Approximation with Local Hypergradient Estimation

    Authors: Davoud Ataee Tarzanagh, Mingchen Li, Pranay Sharma, Samet Oymak

    Abstract: Stochastic approximation with multiple coupled sequences (MSA) has found broad applications in machine learning as it encompasses a rich class of problems including bilevel optimization (BLO), multi-level compositional optimization (MCO), and reinforcement learning (specifically, actor-critic methods). However, designing provably-efficient federated algorithms for MSA has been an elusive question… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  17. arXiv:2305.18869  [pdf, other

    cs.LG cs.AI cs.CL

    Dissecting Chain-of-Thought: Compositionality through In-Context Filtering and Learning

    Authors: Yingcong Li, Kartik Sreenivasan, Angeliki Giannou, Dimitris Papailiopoulos, Samet Oymak

    Abstract: Chain-of-thought (CoT) is a method that enables language models to handle complex reasoning tasks by decomposing them into simpler steps. Despite its success, the underlying mechanics of CoT are not yet fully understood. In an attempt to shed light on this, our study investigates the impact of CoT on the ability of transformers to in-context learn a simple to study, yet general family of compositi… ▽ More

    Submitted 7 November, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted for NeurIPS 2023. Changes in this version: refined title, restructured content, included new out-of-distribution experiments, and code now available

  18. arXiv:2305.08849  [pdf, other

    math.OC cs.LG eess.SY

    Learning on Manifolds: Universal Approximations Properties using Geometric Controllability Conditions for Neural ODEs

    Authors: Karthik Elamvazhuthi, Xuechen Zhang, Samet Oymak, Fabio Pasqualetti

    Abstract: In numerous robotics and mechanical engineering applications, among others, data is often constrained on smooth manifolds due to the presence of rotational degrees of freedom. Common datadriven and learning-based methods such as neural ordinary differential equations (ODEs), however, typically fail to satisfy these manifold constraints and perform poorly for these applications. To address this sho… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: Extended preprint. Accepted for oral presentation at 5th Annual Learning for Dynamics & Control Conference

  19. arXiv:2303.04338  [pdf, other

    cs.LG stat.ML

    Provable Pathways: Learning Multiple Tasks over Multiple Paths

    Authors: Yingcong Li, Samet Oymak

    Abstract: Constructing useful representations across a large number of tasks is a key requirement for sample-efficient intelligent systems. A traditional idea in multitask learning (MTL) is building a shared representation across tasks which can then be adapted to new tasks by tuning last layers. A desirable refinement of using a shared one-fits-all representation is to construct task-specific representatio… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

  20. arXiv:2302.00814  [pdf, other

    cs.LG cs.AI stat.ML

    Stochastic Contextual Bandits with Long Horizon Rewards

    Authors: Yuzhen Qin, Yingcong Li, Fabio Pasqualetti, Maryam Fazel, Samet Oymak

    Abstract: The growing interest in complex decision-making and language modeling problems highlights the importance of sample-efficient learning over very long horizons. This work takes a step in this direction by investigating contextual linear bandits where the current reward depends on at most $s$ prior actions and contexts (not necessarily consecutive), up to a time horizon of $h$. In order to avoid poly… ▽ More

    Submitted 3 February, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: 47 pages, to appear at AAAI 2023

  21. arXiv:2301.07067  [pdf, other

    cs.LG cs.CL stat.ML

    Transformers as Algorithms: Generalization and Stability in In-context Learning

    Authors: Yingcong Li, M. Emrullah Ildiz, Dimitris Papailiopoulos, Samet Oymak

    Abstract: In-context learning (ICL) is a type of prompting where a transformer model operates on a sequence of (input, output) examples and performs inference on-the-fly. In this work, we formalize in-context learning as an algorithm learning problem where a transformer model implicitly constructs a hypothesis function at inference-time. We first explore the statistical aspects of this abstraction through t… ▽ More

    Submitted 6 February, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

    Comments: Revised version significantly improves the stability guarantees and provides new experiments

  22. arXiv:2208.13915  [pdf, other

    cs.LG eess.SY math.OC stat.ML

    Finite Sample Identification of Bilinear Dynamical Systems

    Authors: Yahya Sattar, Samet Oymak, Necmiye Ozay

    Abstract: Bilinear dynamical systems are ubiquitous in many different domains and they can also be used to approximate more general control-affine systems. This motivates the problem of learning bilinear systems from a single trajectory of the system's states and inputs. Under a mild marginal mean-square stability assumption, we identify how much data is needed to estimate the unknown bilinear system up to… ▽ More

    Submitted 29 August, 2022; originally announced August 2022.

  23. arXiv:2205.05820  [pdf, other

    cs.LG eess.SY

    Representation Learning for Context-Dependent Decision-Making

    Authors: Yuzhen Qin, Tommaso Menara, Samet Oymak, ShiNung Ching, Fabio Pasqualetti

    Abstract: Humans are capable of adjusting to changing environments flexibly and quickly. Empirical evidence has revealed that representation learning plays a crucial role in endowing humans with such a capability. Inspired by this observation, we study representation learning in the sequential decision-making scenario with contextual changes. We propose an online algorithm that is able to learn and transfer… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: 3 Figures

  24. arXiv:2205.02215  [pdf, other

    cs.LG math.OC

    FedNest: Federated Bilevel, Minimax, and Compositional Optimization

    Authors: Davoud Ataee Tarzanagh, Mingchen Li, Christos Thrampoulidis, Samet Oymak

    Abstract: Standard federated optimization methods successfully apply to stochastic problems with single-level structure. However, many contemporary ML problems -- including adversarial robustness, hyperparameter tuning, and actor-critic -- fall under nested bilevel programming that subsumes minimax and compositional optimization. In this work, we propose \fedblo: A federated alternating stochastic gradient… ▽ More

    Submitted 13 September, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

    Comments: ICML 2022 (accepted as a long presentation), 34 pages, 6 figures

    Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:21146-21179, 2022

  25. arXiv:2203.16673  [pdf, other

    stat.ML cs.LG eess.SY math.DS math.OC

    System Identification via Nuclear Norm Regularization

    Authors: Yue Sun, Samet Oymak, Maryam Fazel

    Abstract: This paper studies the problem of identifying low-order linear systems via Hankel nuclear norm regularization. Hankel regularization encourages the low-rankness of the Hankel matrix, which maps to the low-orderness of the system. We provide novel statistical analysis for this regularization and carefully contrast it with the unregularized ordinary least-squares (OLS) estimator. Our analysis leads… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

  26. arXiv:2203.02026  [pdf, other

    cs.LG

    Provable and Efficient Continual Representation Learning

    Authors: Yingcong Li, Mingchen Li, M. Salman Asif, Samet Oymak

    Abstract: In continual learning (CL), the goal is to design models that can learn a sequence of tasks without catastrophic forgetting. While there is a rich set of techniques for CL, relatively little understanding exists on how representations built by previous tasks benefit new tasks that are added to the network. To address this, we study the problem of continual representation learning (CRL) where we le… ▽ More

    Submitted 7 November, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

  27. arXiv:2201.06142  [pdf, other

    cs.LG stat.ML

    Towards Sample-efficient Overparameterized Meta-learning

    Authors: Yue Sun, Adhyyan Narang, Halil Ibrahim Gulluk, Samet Oymak, Maryam Fazel

    Abstract: An overarching goal in machine learning is to build a generalizable model with few samples. To this end, overparameterization has been the subject of immense interest to explain the generalization ability of deep nets even when the size of the dataset is smaller than that of the model. While the prior literature focuses on the classical supervised setting, this paper aims to demystify overparamete… ▽ More

    Submitted 16 January, 2022; originally announced January 2022.

    Journal ref: Advances in Neural Information Processing Systems, 34 (2021)

  28. arXiv:2201.04805  [pdf, other

    cs.LG eess.SY math.OC

    Non-Stationary Representation Learning in Sequential Linear Bandits

    Authors: Yuzhen Qin, Tommaso Menara, Samet Oymak, ShiNung Ching, Fabio Pasqualetti

    Abstract: In this paper, we study representation learning for multi-task decision-making in non-stationary environments. We consider the framework of sequential linear bandits, where the agent performs a series of tasks drawn from distinct sets associated with different environments. The embeddings of tasks in each set share a low-dimensional feature extractor called representation, and representations are… ▽ More

    Submitted 16 April, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

    Comments: 24 pages, 7 figures

  29. arXiv:2201.01212  [pdf, other

    cs.LG

    AutoBalance: Optimized Loss Functions for Imbalanced Data

    Authors: Mingchen Li, Xuechen Zhang, Christos Thrampoulidis, Jiasi Chen, Samet Oymak

    Abstract: Imbalanced datasets are commonplace in modern machine learning problems. The presence of under-represented classes or groups with sensitive attributes results in concerns about generalization and fairness. Such concerns are further exacerbated by the fact that large capacity deep nets can perfectly fit the training data and appear to achieve perfect accuracy and fairness during training, but perfo… ▽ More

    Submitted 4 January, 2022; originally announced January 2022.

  30. arXiv:2111.07018  [pdf, other

    cs.LG eess.SY math.OC stat.ML

    Identification and Adaptive Control of Markov Jump Systems: Sample Complexity and Regret Bounds

    Authors: Yahya Sattar, Zhe Du, Davoud Ataee Tarzanagh, Laura Balzano, Necmiye Ozay, Samet Oymak

    Abstract: Learning how to effectively control unknown dynamical systems is crucial for intelligent autonomous systems. This task becomes a significant challenge when the underlying dynamics are changing with time. Motivated by this challenge, this paper considers the problem of controlling an unknown Markov jump linear system (MJS) to optimize a quadratic objective. By taking a model-based perspective, we c… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

  31. arXiv:2110.02459  [pdf, other

    cs.CV cs.LG

    Post-hoc Models for Performance Estimation of Machine Learning Inference

    Authors: Xuechen Zhang, Samet Oymak, Jiasi Chen

    Abstract: Estimating how well a machine learning model performs during inference is critical in a variety of scenarios (for example, to quantify uncertainty, or to choose from a library of available models). However, the standard accuracy estimate of softmax confidence is not versatile and cannot reliably predict different performance metrics (e.g., F1-score, recall) or the performance in different applicat… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

    Comments: 10 pages, 9 figures

    MSC Class: ACM-class: I.4; I.4.8.e; I.1.2.e

  32. arXiv:2105.12358  [pdf, other

    math.OC cs.LG eess.SY

    Certainty Equivalent Quadratic Control for Markov Jump Systems

    Authors: Zhe Du, Yahya Sattar, Davoud Ataee Tarzanagh, Laura Balzano, Samet Oymak, Necmiye Ozay

    Abstract: Real-world control applications often involve complex dynamics subject to abrupt changes or variations. Markov jump linear systems (MJS) provide a rich framework for modeling such dynamics. Despite an extensive history, theoretical understanding of parameter sensitivities of MJS control is somewhat lacking. Motivated by this, we investigate robustness aspects of certainty equivalent model-based op… ▽ More

    Submitted 26 May, 2021; originally announced May 2021.

    Comments: 17 pages, 8 figures

  33. arXiv:2104.14132  [pdf, other

    stat.ML cs.LG

    Generalization Guarantees for Neural Architecture Search with Train-Validation Split

    Authors: Samet Oymak, Mingchen Li, Mahdi Soltanolkotabi

    Abstract: Neural Architecture Search (NAS) is a popular method for automatically designing optimized architectures for high-performance deep learning. In this approach, it is common to use bilevel optimization where one optimizes the model weights over the training data (inner problem) and various hyperparameters such as the configuration of the architecture over the validation data (outer problem). This pa… ▽ More

    Submitted 2 March, 2022; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: ICML 2021

  34. arXiv:2104.01845  [pdf, other

    cs.LG cs.CV

    Unsupervised Multi-source Domain Adaptation Without Access to Source Data

    Authors: Sk Miraj Ahmed, Dripta S. Raychaudhuri, Sujoy Paul, Samet Oymak, Amit K. Roy-Chowdhury

    Abstract: Unsupervised Domain Adaptation (UDA) aims to learn a predictor model for an unlabeled domain by transferring knowledge from a separate labeled source domain. However, most of these conventional UDA approaches make the strong assumption of having access to the source data during training, which may not be very practical due to privacy, security and storage concerns. A recent line of work addressed… ▽ More

    Submitted 5 April, 2021; originally announced April 2021.

    Comments: This paper will appear at CVPR 2021

  35. arXiv:2103.01550  [pdf, other

    cs.LG stat.ML

    Label-Imbalanced and Group-Sensitive Classification under Overparameterization

    Authors: Ganesh Ramachandra Kini, Orestis Paraskevas, Samet Oymak, Christos Thrampoulidis

    Abstract: The goal in label-imbalanced and group-sensitive classification is to optimize relevant metrics such as balanced error and equal opportunity. Classical methods, such as weighted cross-entropy, fail when training deep nets to the terminal phase of training (TPT), that is training beyond zero training error. This observation has motivated recent flurry of activity in develo** heuristic alternative… ▽ More

    Submitted 8 November, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

  36. Provable Super-Convergence with a Large Cyclical Learning Rate

    Authors: Samet Oymak

    Abstract: Conventional wisdom dictates that learning rate should be in the stable regime so that gradient-based algorithms don't blow up. This letter introduces a simple scenario where an unstably large learning rate scheme leads to a super fast convergence, with the convergence rate depending only logarithmically on the condition number of the problem. Our scheme uses a Cyclical Learning Rate (CLR) where w… ▽ More

    Submitted 9 August, 2021; v1 submitted 21 February, 2021; originally announced February 2021.

  37. arXiv:2102.07206  [pdf, other

    cs.LG stat.ML

    Sample Efficient Subspace-based Representations for Nonlinear Meta-Learning

    Authors: Halil Ibrahim Gulluk, Yue Sun, Samet Oymak, Maryam Fazel

    Abstract: Constructing good representations is critical for learning complex tasks in a sample efficient manner. In the context of meta-learning, representations can be constructed from common patterns of previously seen tasks so that a future task can be learned quickly. While recent works show the benefit of subspace-based representations, such results are limited to linear-regression tasks. This work exp… ▽ More

    Submitted 26 February, 2021; v1 submitted 14 February, 2021; originally announced February 2021.

    Comments: To appear in ICASSP 21'

  38. arXiv:2012.08749  [pdf, other

    cs.LG stat.ML

    Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks

    Authors: Xiangyu Chang, Yingcong Li, Samet Oymak, Christos Thrampoulidis

    Abstract: Deep networks are typically trained with many more parameters than the size of the training dataset. Recent empirical evidence indicates that the practice of overparameterization not only benefits training large models, but also assists - perhaps counterintuitively - building lightweight models. Specifically, it suggests that overparameterization benefits model pruning / sparsification. This paper… ▽ More

    Submitted 16 December, 2020; originally announced December 2020.

    Comments: to appear at AAAI 2021

  39. arXiv:2011.08121  [pdf, other

    cs.LG

    On the Marginal Benefit of Active Learning: Does Self-Supervision Eat Its Cake?

    Authors: Yao-Chun Chan, Mingchen Li, Samet Oymak

    Abstract: Active learning is the set of techniques for intelligently labeling large unlabeled datasets to reduce the labeling effort. In parallel, recent developments in self-supervised and semi-supervised learning (S4L) provide powerful techniques, based on data-augmentation, contrastive learning, and self-training, that enable superior utilization of unlabeled data which led to a significant reduction in… ▽ More

    Submitted 16 November, 2020; originally announced November 2020.

  40. arXiv:2011.07729  [pdf, other

    cs.LG math.ST stat.ML

    Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View

    Authors: Christos Thrampoulidis, Samet Oymak, Mahdi Soltanolkotabi

    Abstract: Contemporary machine learning applications often involve classification tasks with many classes. Despite their extensive use, a precise understanding of the statistical properties and behavior of classification algorithms is still missing, especially in modern regimes where the number of classes is rather large. In this paper, we take a step in this direction by providing the first asymptotically… ▽ More

    Submitted 16 November, 2020; originally announced November 2020.

    Comments: To Appear at NeurIPS 2020. 62 pages, 7 figures

  41. arXiv:2007.02244  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Unsupervised Paraphrasing via Deep Reinforcement Learning

    Authors: A. B. Siddique, Samet Oymak, Vagelis Hristidis

    Abstract: Paraphrasing is expressing the meaning of an input sentence in different wording while maintaining fluency (i.e., grammatical and syntactical correctness). Most existing work on paraphrasing use supervised models that are limited to specific domains (e.g., image captions). Such models can neither be straightforwardly transferred to other domains nor generalize well, and creating labeled training d… ▽ More

    Submitted 5 July, 2020; originally announced July 2020.

  42. arXiv:2006.11006  [pdf, other

    cs.LG stat.ML

    Statistical and Algorithmic Insights for Semi-supervised Learning with Self-training

    Authors: Samet Oymak, Talha Cihad Gulcu

    Abstract: Self-training is a classical approach in semi-supervised learning which is successfully applied to a variety of machine learning problems. Self-training algorithm generates pseudo-labels for the unlabeled examples and progressively refines these pseudo-labels which hopefully coincides with the actual labels. This work provides theoretical insights into self-training algorithm with a focus on linea… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

    Comments: 25 pages

  43. arXiv:2006.10903  [pdf, other

    cs.LG stat.ML

    Exploring Weight Importance and Hessian Bias in Model Pruning

    Authors: Mingchen Li, Yahya Sattar, Christos Thrampoulidis, Samet Oymak

    Abstract: Model pruning is an essential procedure for building compact and computationally-efficient machine learning models. A key feature of a good pruning algorithm is that it accurately quantifies the relative importance of the model weights. While model pruning has a rich history, we still don't have a full grasp of the pruning mechanics even for relatively simple problems involving linear models or sh… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: 28 pages

  44. arXiv:2002.09831  [pdf, other

    cs.LG stat.ML

    On the Role of Dataset Quality and Heterogeneity in Model Confidence

    Authors: Yuan Zhao, Jiasi Chen, Samet Oymak

    Abstract: Safety-critical applications require machine learning models that output accurate and calibrated probabilities. While uncalibrated deep networks are known to make over-confident predictions, it is unclear how model confidence is impacted by the variations in the data, such as label noise or class size. In this paper, we investigate the role of the dataset quality by studying the impact of dataset… ▽ More

    Submitted 23 February, 2020; originally announced February 2020.

    Comments: 25 pages, 14 figures

  45. arXiv:2002.08538  [pdf, other

    cs.LG eess.SY math.OC stat.AP stat.ML

    Non-asymptotic and Accurate Learning of Nonlinear Dynamical Systems

    Authors: Yahya Sattar, Samet Oymak

    Abstract: We consider the problem of learning stabilizable systems governed by nonlinear state equation $h_{t+1}=φ(h_t,u_t;θ)+w_t$. Here $θ$ is the unknown system dynamics, $h_t $ is the state, $u_t$ is the input and $w_t$ is the additive noise vector. We study gradient based algorithms to learn the system dynamics $θ$ from samples obtained from a single finite trajectory. If the system is run by a stabiliz… ▽ More

    Submitted 17 November, 2021; v1 submitted 19 February, 2020; originally announced February 2020.

    Comments: presentation improved, proof sketch added, Assumption 2(b) removed, references added

  46. arXiv:1907.01728  [pdf, other

    cs.LG cs.IT stat.ML

    Quickly Finding the Best Linear Model in High Dimensions

    Authors: Yahya Sattar, Samet Oymak

    Abstract: We study the problem of finding the best linear model that can minimize least-squares loss given a data-set. While this problem is trivial in the low dimensional regime, it becomes more interesting in high dimensions where the population minimizer is assumed to lie on a manifold such as sparse vectors. We propose projected gradient descent (PGD) algorithm to estimate the population minimizer in th… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

    Journal ref: IEEE Transactions on Signal Processing, 2020

  47. arXiv:1906.05392  [pdf, other

    cs.LG math.OC stat.ML

    Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian

    Authors: Samet Oymak, Zalan Fabian, Mingchen Li, Mahdi Soltanolkotabi

    Abstract: Modern neural network architectures often generalize well despite containing many more parameters than the size of the training dataset. This paper explores the generalization capabilities of neural networks trained via gradient descent. We develop a data-dependent optimization and generalization theory which leverages the low-rank structure of the Jacobian matrix associated with the network. Our… ▽ More

    Submitted 3 July, 2019; v1 submitted 12 June, 2019; originally announced June 2019.

  48. arXiv:1903.11680  [pdf, other

    cs.LG stat.ML

    Gradient Descent with Early Stop** is Provably Robust to Label Noise for Overparameterized Neural Networks

    Authors: Mingchen Li, Mahdi Soltanolkotabi, Samet Oymak

    Abstract: Modern neural networks are typically trained in an over-parameterized regime where the parameters of the model far exceed the size of the training data. Such neural networks in principle have the capacity to (over)fit any set of labels including pure noise. Despite this, somewhat paradoxically, neural network models trained via first-order methods continue to predict well on yet unseen test data.… ▽ More

    Submitted 3 July, 2019; v1 submitted 27 March, 2019; originally announced March 2019.

  49. arXiv:1902.04674  [pdf, other

    cs.LG cs.IT math.OC stat.ML

    Towards moderate overparameterization: global convergence guarantees for training shallow neural networks

    Authors: Samet Oymak, Mahdi Soltanolkotabi

    Abstract: Many modern neural network architectures are trained in an overparameterized regime where the parameters of the model exceed the size of the training dataset. Sufficiently overparameterized neural network architectures in principle have the capacity to fit any set of labels including random noise. However, given the highly nonconvex nature of the training landscape it is not clear what level and k… ▽ More

    Submitted 12 February, 2019; originally announced February 2019.

  50. arXiv:1812.10004  [pdf, other

    cs.LG math.OC stat.ML

    Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path?

    Authors: Samet Oymak, Mahdi Soltanolkotabi

    Abstract: Many modern learning tasks involve fitting nonlinear models to data which are trained in an overparameterized regime where the parameters of the model exceed the size of the training dataset. Due to this overparameterization, the training loss may have infinitely many global minima and it is critical to understand the properties of the solutions found by first-order optimization schemes such as (s… ▽ More

    Submitted 24 December, 2018; originally announced December 2018.