Skip to main content

Showing 1–8 of 8 results for author: Lau, T T

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.13936  [pdf, other

    stat.ML cs.LG math.OC

    Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods

    Authors: Tim Tsz-Kit Lau, Weijian Li, Chenwei Xu, Han Liu, Mladen Kolar

    Abstract: Modern deep neural networks often require distributed training with many workers due to their large size. As worker numbers increase, communication overheads become the main bottleneck in data-parallel minibatch stochastic gradient methods with per-iteration gradient synchronization. Local gradient methods like Local SGD reduce communication by only syncing after several local steps. Despite under… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  2. arXiv:2402.11215  [pdf, other

    cs.LG math.OC stat.ML

    AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods

    Authors: Tim Tsz-Kit Lau, Han Liu, Mladen Kolar

    Abstract: The choice of batch sizes in minibatch stochastic gradient optimizers is critical in large-scale model training for both optimization and generalization performance. Although large-batch training is arguably the dominant training paradigm for large-scale deep learning due to hardware advances, the generalization performance of the model deteriorates compared to small-batch training, leading to the… ▽ More

    Submitted 28 May, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  3. arXiv:2305.15988  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    Non-Log-Concave and Nonsmooth Sampling via Langevin Monte Carlo Algorithms

    Authors: Tim Tsz-Kit Lau, Han Liu, Thomas Pock

    Abstract: We study the problem of approximate sampling from non-log-concave distributions, e.g., Gaussian mixtures, which is often challenging even in low dimensions due to their multimodality. We focus on performing this task via Markov chain Monte Carlo (MCMC) methods derived from discretizations of the overdamped Langevin diffusions, which are commonly known as Langevin Monte Carlo algorithms. Furthermor… ▽ More

    Submitted 29 May, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

  4. arXiv:2207.04387  [pdf, other

    stat.ML cs.LG stat.CO

    Bregman Proximal Langevin Monte Carlo via Bregman--Moreau Envelopes

    Authors: Tim Tsz-Kit Lau, Han Liu

    Abstract: We propose efficient Langevin Monte Carlo algorithms for sampling distributions with nonsmooth convex composite potentials, which is the sum of a continuously differentiable function and a possibly nonsmooth function. We devise such algorithms leveraging recent advances in convex analysis and optimization methods involving Bregman divergences, namely the Bregman--Moreau envelopes and the Bregman p… ▽ More

    Submitted 10 July, 2022; originally announced July 2022.

    Comments: Proceeding of the 39th International Conference on Machine Learning (ICML), Baltimore, Maryland, USA, PMLR 162, 2022

  5. arXiv:2203.12136  [pdf, other

    stat.ML cs.LG math.OC

    Wasserstein Distributionally Robust Optimization with Wasserstein Barycenters

    Authors: Tim Tsz-Kit Lau, Han Liu

    Abstract: In many applications in statistics and machine learning, the availability of data samples from multiple possibly heterogeneous sources has become increasingly prevalent. On the other hand, in distributionally robust optimization, we seek data-driven decisions which perform well under the most adverse distribution from a nominal distribution constructed from data samples within a certain discrepanc… ▽ More

    Submitted 30 May, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

  6. arXiv:2203.07092  [pdf, other

    cs.LG cs.MA stat.ML

    The Multi-Agent Pickup and Delivery Problem: MAPF, MARL and Its Warehouse Applications

    Authors: Tim Tsz-Kit Lau, Biswa Sengupta

    Abstract: We study two state-of-the-art solutions to the multi-agent pickup and delivery (MAPD) problem based on different principles -- multi-agent path-finding (MAPF) and multi-agent reinforcement learning (MARL). Specifically, a recent MAPF algorithm called conflict-based search (CBS) and a current MARL algorithm called shared experience actor-critic (SEAC) are studied. While the performance of these alg… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

  7. arXiv:1803.09082  [pdf, other

    stat.ML cs.LG math.OC

    A Proximal Block Coordinate Descent Algorithm for Deep Neural Network Training

    Authors: Tim Tsz-Kit Lau, **shan Zeng, Baoyuan Wu, Yuan Yao

    Abstract: Training deep neural networks (DNNs) efficiently is a challenge due to the associated highly nonconvex optimization. The backpropagation (backprop) algorithm has long been the most widely used algorithm for gradient computation of parameters of DNNs and is used along with gradient descent-type algorithms for this optimization task. Recent work have shown the efficiency of block coordinate descent… ▽ More

    Submitted 24 March, 2018; originally announced March 2018.

    Comments: The 6th International Conference on Learning Representations (ICLR 2018), Workshop Track

  8. arXiv:1803.00225  [pdf, other

    math.OC cs.LG stat.ML

    Global Convergence of Block Coordinate Descent in Deep Learning

    Authors: **shan Zeng, Tim Tsz-Kit Lau, Shaobo Lin, Yuan Yao

    Abstract: Deep learning has aroused extensive attention due to its great empirical success. The efficiency of the block coordinate descent (BCD) methods has been recently demonstrated in deep neural network (DNN) training. However, theoretical studies on their convergence properties are limited due to the highly nonconvex nature of DNN training. In this paper, we aim at providing a general methodology for p… ▽ More

    Submitted 12 May, 2019; v1 submitted 1 March, 2018; originally announced March 2018.

    Comments: 27 pages, 2 figures

    Journal ref: Proceeding of the 36th International Conference on Machine Learning (ICML), 2019