Skip to main content

Showing 1–13 of 13 results for author: Karakus, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.08893  [pdf, other

    cs.LG math.OC

    MADA: Meta-Adaptive Optimizers through hyper-gradient Descent

    Authors: Kaan Ozkara, Can Karakus, Parameswaran Raman, Mingyi Hong, Shoham Sabach, Branislav Kveton, Volkan Cevher

    Abstract: Following the introduction of Adam, several novel adaptive optimizers for deep learning have been proposed. These optimizers typically excel in some tasks but may not outperform Adam uniformly across all tasks. In this work, we introduce Meta-Adaptive Optimizers (MADA), a unified optimizer framework that can generalize several known optimizers and dynamically learn the most suitable one during tra… ▽ More

    Submitted 17 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

  2. arXiv:2111.05972  [pdf, other

    cs.LG cs.AI cs.DC

    Amazon SageMaker Model Parallelism: A General and Flexible Framework for Large Model Training

    Authors: Can Karakus, Rahul Huilgol, Fei Wu, Anirudh Subramanian, Cade Daniel, Derya Cavdar, Teng Xu, Haohan Chen, Arash Rahnama, Luis Quintela

    Abstract: With deep learning models rapidly growing in size, systems-level solutions for large-model training are required. We present Amazon SageMaker model parallelism, a software library that integrates with PyTorch, and enables easy training of large models using model parallelism and other memory-saving features. In contrast to existing solutions, the implementation of the SageMaker library is much mor… ▽ More

    Submitted 10 November, 2021; originally announced November 2021.

    Comments: 24 pages. Submitted for review

  3. arXiv:1906.02367  [pdf, other

    stat.ML cs.DC cs.LG math.OC

    Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations

    Authors: Debraj Basu, Deepesh Data, Can Karakus, Suhas Diggavi

    Abstract: Communication bottleneck has been identified as a significant issue in distributed optimization of large-scale learning models. Recently, several approaches to mitigate this problem have been proposed, including different forms of gradient compression or computing local models and mixing them iteratively. In this paper, we propose \emph{Qsparse-local-SGD} algorithm, which combines aggressive spars… ▽ More

    Submitted 2 November, 2019; v1 submitted 5 June, 2019; originally announced June 2019.

    Comments: 50 pages; 8 figures; full version of a paper in NeurIPS 2019 with the same title

  4. arXiv:1905.04035  [pdf, other

    cs.LG cs.CL cs.DC

    Densifying Assumed-sparse Tensors: Improving Memory Efficiency and MPI Collective Performance during Tensor Accumulation for Parallelized Training of Neural Machine Translation Models

    Authors: Derya Cavdar, Valeriu Codreanu, Can Karakus, John A. Lockman III, Damian Podareanu, Vikram Saletore, Alexander Sergeev, Don D. Smith II, Victor Suthichai, Quy Ta, Srinivas Varadharajan, Lucas A. Wilson, Rengan Xu, Pei Yang

    Abstract: Neural machine translation - using neural networks to translate human language - is an area of active research exploring new neuron types and network topologies with the goal of dramatically improving machine translation performance. Current state-of-the-art approaches, such as the multi-head attention-based transformer, require very large translation corpuses and many epochs to produce models of… ▽ More

    Submitted 10 May, 2019; originally announced May 2019.

    Comments: 18 pages, 10 figures, accepted at the 2019 International Supercomputing Conference

  5. arXiv:1903.07792  [pdf, other

    cs.LG cs.CR cs.SI math.OC stat.ML

    Differentially Private Consensus-Based Distributed Optimization

    Authors: Mehrdad Showkatbakhsh, Can Karakus, Suhas Diggavi

    Abstract: Data privacy is an important concern in learning, when datasets contain sensitive information about individuals. This paper considers consensus-based distributed optimization under data privacy constraints. Consensus-based optimization consists of a set of computational nodes arranged in a graph, each having a local objective that depends on their local data, where in every step nodes take a linea… ▽ More

    Submitted 18 March, 2019; originally announced March 2019.

  6. arXiv:1902.04688  [pdf, ps, other

    cs.LG cs.CR cs.IT stat.ML

    Privacy-Utility Trade-off of Linear Regression under Random Projections and Additive Noise

    Authors: Mehrdad Showkatbakhsh, Can Karakus, Suhas Diggavi

    Abstract: Data privacy is an important concern in machine learning, and is fundamentally at odds with the task of training useful learning models, which typically require the acquisition of large amounts of private user data. One possible way of fulfilling the machine learning task while preserving user privacy is to train the model on a transformed, noisy version of the data, which does not reveal the data… ▽ More

    Submitted 12 February, 2019; originally announced February 2019.

    Comments: A short version is published in ISIT 2018

  7. arXiv:1803.05397  [pdf, other

    stat.ML cs.DC cs.LG math.OC

    Redundancy Techniques for Straggler Mitigation in Distributed Optimization and Learning

    Authors: Can Karakus, Yifan Sun, Suhas Diggavi, Wotao Yin

    Abstract: Performance of distributed optimization and learning systems is bottlenecked by "straggler" nodes and slow communication links, which significantly delay computation. We propose a distributed optimization framework where the dataset is "encoded" to have an over-complete representation with built-in redundancy, and the straggling nodes in the system are dynamically left out of the computation at ev… ▽ More

    Submitted 14 March, 2018; originally announced March 2018.

    Comments: 39 pages, 14 figures. Submitted for publication

  8. arXiv:1711.04969  [pdf, other

    stat.ML cs.DC cs.IT cs.LG

    Straggler Mitigation in Distributed Optimization Through Data Encoding

    Authors: Can Karakus, Yifan Sun, Suhas Diggavi, Wotao Yin

    Abstract: Slow running or straggler tasks can significantly reduce computation speed in distributed computation. Recently, coding-theory-inspired approaches have been applied to mitigate the effect of straggling, through embedding redundancy in certain linear computational steps of the optimization algorithm, thus completing the computation without waiting for the stragglers. In this paper, we propose an al… ▽ More

    Submitted 22 January, 2018; v1 submitted 14 November, 2017; originally announced November 2017.

    Comments: appeared at NIPS 2017

  9. arXiv:1706.03659  [pdf, other

    cs.IT

    Approximate Capacity of Fast Fading Interference Channels with No Instantaneous CSIT

    Authors: Joyson Sebastian, Can Karakus, Suhas Diggavi

    Abstract: We develop a characterization of fading models, which assigns a number called logarithmic Jensen's gap to a given fading model. We show that as a consequence of a finite logarithmic Jensen's gap, approximate capacity region can be obtained for fast fading interference channels (FF-IC) for several scenarios. We illustrate three instances where a constant capacity gap can be obtained as a function o… ▽ More

    Submitted 3 June, 2018; v1 submitted 12 June, 2017; originally announced June 2017.

    Comments: Minor typos corrected

  10. arXiv:1604.06151  [pdf, other

    cs.IT cs.NI

    Enhancing Multiuser MIMO Through Opportunistic D2D Cooperation

    Authors: Can Karakus, Suhas Diggavi

    Abstract: We propose a cellular architecture that combines multiuser MIMO (MU-MIMO) downlink with opportunistic use of unlicensed ISM bands to establish device-to-device (D2D) cooperation. The architecture consists of a physical-layer cooperation scheme based on forming downlink virtual MIMO channels through D2D relaying, and a novel resource allocation strategy for such D2D-enabled networks. We prove the a… ▽ More

    Submitted 6 March, 2017; v1 submitted 20 April, 2016; originally announced April 2016.

    Comments: 43 pages, 18 figures. Submitted for publication

  11. Opportunistic Scheduling for Full-Duplex Uplink-Downlink Networks

    Authors: Can Karakus, Suhas Diggavi

    Abstract: We study opportunistic scheduling and the sum capacity of cellular networks with a full-duplex multi-antenna base station and a large number of single-antenna half-duplex users. Simultaneous uplink and downlink over the same band results in uplink-to-downlink interference, degrading performance. We present a simple opportunistic joint uplink-downlink scheduling algorithm that exploits multiuser di… ▽ More

    Submitted 22 April, 2015; originally announced April 2015.

    Comments: 10 pages, 2 figures, to appear at IEEE International Symposium on Information Theory (ISIT) '15

  12. Gaussian Interference Channel with Intermittent Feedback

    Authors: Can Karakus, I-Hsiang Wang, Suhas Diggavi

    Abstract: We investigate how to exploit intermittent feedback for interference management by studying the two-user Gaussian interference channel (IC). We approximately characterize (within a universal constant) the capacity region for the Gaussian IC with intermittent feedback. We exactly characterize the the capacity region of the linear deterministic version of the problem, which gives us insight into the… ▽ More

    Submitted 30 August, 2015; v1 submitted 20 August, 2014; originally announced August 2014.

    Comments: 36 pages, 12 figures, appeared in IEEE Transactions on Information Theory

  13. Interference Channel with Intermittent Feedback

    Authors: Can Karakus, I-Hsiang Wang, Suhas Diggavi

    Abstract: We investigate how to exploit intermittent feedback for interference management. Focusing on the two-user linear deterministic interference channel, we completely characterize the capacity region. We find that the characterization only depends on the forward channel parameters and the marginal probability distribution of each feedback link. The scheme we propose makes use of block Markov encoding… ▽ More

    Submitted 17 May, 2013; v1 submitted 14 May, 2013; originally announced May 2013.

    Comments: Extended version of the same-titled paper that appears in IEEE International Symposium on Information Theory (ISIT) 2013