Skip to main content

Showing 1–12 of 12 results for author: Ahn, K

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.16002  [pdf, other

    cs.LG math.OC stat.ML

    Does SGD really happen in tiny subspaces?

    Authors: Minhak Song, Kwangjun Ahn, Chulhee Yun

    Abstract: Understanding the training dynamics of deep neural networks is challenging due to their high-dimensional nature and intricate loss landscapes. Recent studies have revealed that, along the training trajectory, the gradient approximately aligns with a low-rank top eigenspace of the training loss Hessian, referred to as the dominant subspace. Given this alignment, this paper explores whether neural n… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 22 pages

  2. arXiv:2305.15287  [pdf, other

    cs.LG cs.AI stat.ML

    The Crucial Role of Normalization in Sharpness-Aware Minimization

    Authors: Yan Dai, Kwangjun Ahn, Suvrit Sra

    Abstract: Sharpness-Aware Minimization (SAM) is a recently proposed gradient-based optimizer (Foret et al., ICLR 2021) that greatly improves the prediction performance of deep neural networks. Consequently, there has been a surge of interest in explaining its empirical success. We focus, in particular, on understanding the role played by normalization, a key component of the SAM updates. We theoretically an… ▽ More

    Submitted 23 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: 30 pages, Published in 37th Neural Information Processing Systems (NeurIPS 2023)

  3. arXiv:2305.09474  [pdf

    q-fin.RM stat.AP

    Probabilistic Forecast-based Portfolio Optimization of Electricity Demand at Low Aggregation Levels

    Authors: Jungyeon Park, Estêvão Alvarenga, Jooyoung Jeon, Ran Li, Fotios Petropoulos, Hokyun Kim, Kwangwon Ahn

    Abstract: In the effort to achieve carbon neutrality through a decentralized electricity market, accurate short-term load forecasting at low aggregation levels has become increasingly crucial for various market participants' strategies. Accurate probabilistic forecasts at low aggregation levels can improve peer-to-peer energy sharing, demand response, and the operation of reliable distribution networks. How… ▽ More

    Submitted 18 April, 2023; originally announced May 2023.

  4. arXiv:2207.13853  [pdf, other

    cs.LG eess.SY stat.ML

    One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares

    Authors: Youngjae Min, Kwangjun Ahn, Navid Azizan

    Abstract: While deep neural networks are capable of achieving state-of-the-art performance in various domains, their training typically requires iterating for many passes over the dataset. However, due to computational and memory constraints and potential privacy concerns, storing and accessing all the data is impractical in many real-world scenarios where the data arrives in a stream. In this paper, we inv… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: IEEE Conference on Decision and Control, 2022

  5. arXiv:2202.04598  [pdf, ps, other

    math.OC cs.LG stat.ML

    Reproducibility in Optimization: Theoretical Framework and Limits

    Authors: Kwangjun Ahn, Prateek Jain, Ziwei Ji, Satyen Kale, Praneeth Netrapalli, Gil I. Shamir

    Abstract: We initiate a formal study of reproducibility in optimization. We define a quantitative measure of reproducibility of optimization procedures in the face of noisy or error-prone operations such as inexact or stochastic gradient computations or inexact initialization. We then analyze several convex optimization settings of interest such as smooth, non-smooth, and strongly-convex objective functions… ▽ More

    Submitted 4 December, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

    Comments: 45 Pages; Accepted to NeurIPS 2022

  6. arXiv:2201.13419  [pdf, ps, other

    cs.LG math.OC stat.ML

    Agnostic Learnability of Halfspaces via Logistic Loss

    Authors: Ziwei Ji, Kwangjun Ahn, Pranjal Awasthi, Satyen Kale, Stefani Karp

    Abstract: We investigate approximation guarantees provided by logistic regression for the fundamental problem of agnostic learning of homogeneous halfspaces. Previously, for a certain broad class of "well-behaved" distributions on the examples, Diakonikolas et al. (2020) proved an $\tildeΩ(\textrm{OPT})$ lower bound, while Frei et al. (2021) proved an $\tilde{O}(\sqrt{\textrm{OPT}})$ upper bound, where… ▽ More

    Submitted 31 January, 2022; originally announced January 2022.

  7. arXiv:2102.00937  [pdf, other

    math.OC cs.LG math.DG stat.ML

    Riemannian Perspective on Matrix Factorization

    Authors: Kwangjun Ahn, Felipe Suarez

    Abstract: We study the non-convex matrix factorization approach to matrix completion via Riemannian geometry. Based on an optimization formulation over a Grassmannian manifold, we characterize the landscape based on the notion of principal angles between subspaces. For the fully observed case, our results show that there is a region in which the cost is geodesically convex, and outside of which all critical… ▽ More

    Submitted 1 February, 2021; originally announced February 2021.

    Comments: 23 pages, 6 figures. Comments would be appreciated!

  8. arXiv:2006.06946  [pdf, other

    math.OC stat.ML

    SGD with shuffling: optimal rates without component convexity and large epoch requirements

    Authors: Kwangjun Ahn, Chulhee Yun, Suvrit Sra

    Abstract: We study without-replacement SGD for solving finite-sum optimization problems. Specifically, depending on how the indices of the finite-sum are shuffled, we consider the RandomShuffle (shuffle at the beginning of each epoch) and SingleShuffle (shuffle only once) algorithms. First, we establish minimax optimal convergence rates of these algorithms up to poly-log factors. Notably, our analysis is ge… ▽ More

    Submitted 21 June, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: 53 pages; supersedes the preprint arXiv:2004.08657; v2 corrects an erroneous claim about SingleShuffle and newly adds Theorem 24 and Appendix F for SingleShuffle

  9. arXiv:2001.08876  [pdf, other

    math.OC stat.ML

    From Nesterov's Estimate Sequence to Riemannian Acceleration

    Authors: Kwangjun Ahn, Suvrit Sra

    Abstract: We propose the first global accelerated gradient method for Riemannian manifolds. Toward establishing our result we revisit Nesterov's estimate sequence technique and develop an alternative analysis for it that may also be of independent interest. Then, we extend this analysis to the Riemannian setting, localizing the key difficulty due to non-Euclidean structure into a certain ``metric distortion… ▽ More

    Submitted 23 January, 2020; originally announced January 2020.

    Comments: 30 pages

  10. Regression Models Using Shapes of Functions as Predictors

    Authors: Kyungmin Ahn, J. Derek Tucker, Wei Wu, Anuj Srivastava

    Abstract: Functional variables are often used as predictors in regression problems. A commonly-used parametric approach, called {\it scalar-on-function regression}, uses the $\ltwo$ inner product to map functional predictors into scalar responses. This method can perform poorly when predictor functions contain undesired phase variability, causing phases to have disproportionately large influence on the resp… ▽ More

    Submitted 25 May, 2020; v1 submitted 5 September, 2019; originally announced September 2019.

    Comments: 30 pages

  11. arXiv:1805.08956  [pdf, ps, other

    math.ST cs.IT stat.ML

    Hypergraph Spectral Clustering in the Weighted Stochastic Block Model

    Authors: Kwangjun Ahn, Kangwook Lee, Changho Suh

    Abstract: Spectral clustering is a celebrated algorithm that partitions objects based on pairwise similarity information. While this approach has been successfully applied to a variety of domains, it comes with limitations. The reason is that there are many other applications in which only \emph{multi}-way similarity measures are available. This motivates us to explore the multi-way measurement setting. In… ▽ More

    Submitted 23 May, 2018; originally announced May 2018.

    Comments: 16 pages; 3 figures

    Journal ref: October 2018 special issue on "Information-Theoretic Methods in Data Acquisition, Analysis, and Processing" of the IEEE Journal of Selected Topics in Signal Processing

  12. arXiv:1709.03670  [pdf, other

    cs.IT cs.LG stat.ML

    Community Recovery in Hypergraphs

    Authors: Kwangjun Ahn, Kangwook Lee, Changho Suh

    Abstract: Community recovery is a central problem that arises in a wide variety of applications such as network clustering, motion segmentation, face clustering and protein complex detection. The objective of the problem is to cluster data points into distinct communities based on a set of measurements, each of which is associated with the values of a certain number of data points. While most of the prior w… ▽ More

    Submitted 11 September, 2017; originally announced September 2017.

    Comments: 25 pages, 7 figures. Submitted to IEEE Transacations on Information Theory