Search | arXiv e-print repository

Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

Authors: Joo Young Choi, Jaesung R. Park, Inkyu Park, Jaewoong Cho, Albert No, Ernest K. Ryu

Abstract: Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional la… ▽ More Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional layers but does not directly affect the attention layers. While these standard architectural choices are certainly effective, not conditioning the attention layers feels arbitrary and potentially suboptimal. In this work, we show that simply adding LoRA conditioning to the attention layers without changing or tuning the other parts of the U-Net architecture improves the image generation quality. For example, a drop-in addition of LoRA conditioning to EDM diffusion model yields FID scores of 1.91/1.75 for unconditional and class-conditional CIFAR-10 generation, improving upon the baseline of 1.97/1.79. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.13228 [pdf, other]

Optimal Acceleration for Minimax and Fixed-Point Problems is Not Unique

Authors: TaeHo Yoon, Jaeyeon Kim, Jaewook J. Suh, Ernest K. Ryu

Abstract: Recently, accelerated algorithms using the anchoring mechanism for minimax optimization and fixed-point problems have been proposed, and matching complexity lower bounds establish their optimality. In this work, we present the surprising observation that the optimal acceleration mechanism in minimax optimization and fixed-point problems is not unique. Our new algorithms achieve exactly the same wo… ▽ More Recently, accelerated algorithms using the anchoring mechanism for minimax optimization and fixed-point problems have been proposed, and matching complexity lower bounds establish their optimality. In this work, we present the surprising observation that the optimal acceleration mechanism in minimax optimization and fixed-point problems is not unique. Our new algorithms achieve exactly the same worst-case convergence rates as existing anchor-based methods while using materially different acceleration mechanisms. Specifically, these new algorithms are dual to the prior anchor-based accelerated methods in the sense of H-duality. This finding opens a new avenue of research on accelerated algorithms since we now have a family of methods that empirically exhibit varied characteristics while having the same optimal worst-case guarantee. △ Less

Submitted 23 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

arXiv:2403.17199 [pdf, other]

Extracting Social Support and Social Isolation Information from Clinical Psychiatry Notes: Comparing a Rule-based NLP System and a Large Language Model

Authors: Braja Gopal Patra, Lauren A. Lepow, Praneet Kasi Reddy Jagadeesh Kumar, Veer Vekaria, Mohit Manoj Sharma, Prakash Adekkanattu, Brian Fennessy, Gavin Hynes, Isotta Landi, Jorge A. Sanchez-Ruiz, Euijung Ryu, Joanna M. Biernacka, Girish N. Nadkarni, Ardesheer Talati, Myrna Weissman, Mark Olfson, J. John Mann, Alexander W. Charney, Jyotishman Pathak

Abstract: Background: Social support (SS) and social isolation (SI) are social determinants of health (SDOH) associated with psychiatric outcomes. In electronic health records (EHRs), individual-level SS/SI is typically documented as narrative clinical notes rather than structured coded data. Natural language processing (NLP) algorithms can automate the otherwise labor-intensive process of data extraction.… ▽ More Background: Social support (SS) and social isolation (SI) are social determinants of health (SDOH) associated with psychiatric outcomes. In electronic health records (EHRs), individual-level SS/SI is typically documented as narrative clinical notes rather than structured coded data. Natural language processing (NLP) algorithms can automate the otherwise labor-intensive process of data extraction. Data and Methods: Psychiatric encounter notes from Mount Sinai Health System (MSHS, n=300) and Weill Cornell Medicine (WCM, n=225) were annotated and established a gold standard corpus. A rule-based system (RBS) involving lexicons and a large language model (LLM) using FLAN-T5-XL were developed to identify mentions of SS and SI and their subcategories (e.g., social network, instrumental support, and loneliness). Results: For extracting SS/SI, the RBS obtained higher macro-averaged f-scores than the LLM at both MSHS (0.89 vs. 0.65) and WCM (0.85 vs. 0.82). For extracting subcategories, the RBS also outperformed the LLM at both MSHS (0.90 vs. 0.62) and WCM (0.82 vs. 0.81). Discussion and Conclusion: Unexpectedly, the RBS outperformed the LLMs across all metrics. Intensive review demonstrates that this finding is due to the divergent approach taken by the RBS and LLM. The RBS were designed and refined to follow the same specific rules as the gold standard annotations. Conversely, the LLM were more inclusive with categorization and conformed to common English-language understanding. Both approaches offer advantages and are made available open-source for future testing. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 2 figures, 3 tables

arXiv:2403.04616 [pdf, other]

Modeling reputation-based behavioral biases in school choice

Authors: Jon Kleinberg, Sigal Oren, Emily Ryu, Éva Tardos

Abstract: A fundamental component in the theoretical school choice literature is the problem a student faces in deciding which schools to apply to. Recent models have considered a set of schools of different selectiveness and a student who is unsure of their strength and can apply to at most $k$ schools. Such models assume that the student cares solely about maximizing the quality of the school that they at… ▽ More A fundamental component in the theoretical school choice literature is the problem a student faces in deciding which schools to apply to. Recent models have considered a set of schools of different selectiveness and a student who is unsure of their strength and can apply to at most $k$ schools. Such models assume that the student cares solely about maximizing the quality of the school that they attend, but experience suggests that students' decisions are also influenced by a set of behavioral biases based on reputational effects: a subjective reputational benefit when admitted to a selective school, whether or not they attend; and a subjective loss based on disappointment when rejected. Guided by these observations, and inspired by recent behavioral economics work on loss aversion relative to expectations, we propose a behavioral model by which a student chooses schools to balance these behavioral effects with the quality of the school they attend. Our main results show that a student's choices change in dramatic ways when these reputation-based behavioral biases are taken into account. In particular, where a rational applicant spreads their applications evenly, a biased student applies very sparsely to highly selective schools, such that above a certain threshold they apply to only an absolute constant number of schools even as their budget of applications grows to infinity. Consequently, a biased student underperforms a rational student even when the rational student is restricted to a sufficiently large upper bound on applications and the biased student can apply to arbitrarily many. Our analysis shows that the reputation-based model is rich enough to cover a range of different ways that biased students cope with fear of rejection, including not just targeting less selective schools, but also occasionally applying to schools that are too selective, compared to rational students. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 22 pages, 8 figures

arXiv:2403.03937 [pdf, ps, other]

Settling the Competition Complexity of Additive Buyers over Independent Items

Authors: Mahsa Derakhshan, Emily Ryu, S. Matthew Weinberg, Eric Xue

Abstract: The competition complexity of an auction setting is the number of additional bidders needed such that the simple mechanism of selling items separately (with additional bidders) achieves greater revenue than the optimal but complex (randomized, prior-dependent, Bayesian-truthful) optimal mechanism without the additional bidders. Our main result settles the competition complexity of $n$ bidders with… ▽ More The competition complexity of an auction setting is the number of additional bidders needed such that the simple mechanism of selling items separately (with additional bidders) achieves greater revenue than the optimal but complex (randomized, prior-dependent, Bayesian-truthful) optimal mechanism without the additional bidders. Our main result settles the competition complexity of $n$ bidders with additive values over $m < n$ independent items at $Θ(\sqrt{nm})$. The $O(\sqrt{nm})$ upper bound is due to [BW19], and our main result improves the prior lower bound of $Ω(\ln n)$ to $Ω(\sqrt{nm})$. Our main result follows from an explicit construction of a Bayesian IC auction for $n$ bidders with additive values over $m<n$ independent items drawn from the Equal Revenue curve truncated at $\sqrt{nm}$ ($\mathcal{ER}_{\le \sqrt{nm}}$), which achieves revenue that exceeds $\text{SRev}_{n+\sqrt{nm}}(\mathcal{ER}_{\le \sqrt{nm}}^m)$. Along the way, we show that the competition complexity of $n$ bidders with additive values over $m$ independent items is exactly equal to the minimum $c$ such that $\text{SRev}_{n+c}(\mathcal{ER}_{\le p}^m) \geq \text{Rev}_n(\mathcal{ER}_{\le p}^m)$ for all $p$ (that is, some truncated Equal Revenue witnesses the worst-case competition complexity). Interestingly, we also show that the untruncated Equal Revenue curve does not witness the worst-case competition complexity when $n > m$: $\text{SRev}_n(\mathcal{ER}^m) = nm+O_m(\ln (n)) \leq \text{SRev}_{n+O_m(\ln (n))}(\mathcal{ER}^m)$, and therefore our result can only follow by considering all possible truncations. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 50 pages

arXiv:2402.11867 [pdf, other]

LoRA Training in the NTK Regime has No Spurious Local Minima

Authors: Uijeong Jang, Jason D. Lee, Ernest K. Ryu

Abstract: Low-rank adaptation (LoRA) has become the standard approach for parameter-efficient fine-tuning of large language models (LLM), but our theoretical understanding of LoRA has been limited. In this work, we theoretically analyze LoRA fine-tuning in the neural tangent kernel (NTK) regime with $N$ data points, showing: (i) full fine-tuning (without LoRA) admits a low-rank solution of rank… ▽ More Low-rank adaptation (LoRA) has become the standard approach for parameter-efficient fine-tuning of large language models (LLM), but our theoretical understanding of LoRA has been limited. In this work, we theoretically analyze LoRA fine-tuning in the neural tangent kernel (NTK) regime with $N$ data points, showing: (i) full fine-tuning (without LoRA) admits a low-rank solution of rank $r\lesssim \sqrt{N}$; (ii) using LoRA with rank $r\gtrsim \sqrt{N}$ eliminates spurious local minima, allowing gradient descent to find the low-rank solutions; (iii) the low-rank solution found using LoRA generalizes well. △ Less

Submitted 28 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: 23 pages

arXiv:2311.17296 [pdf, other]

Mirror Duality in Convex Optimization

Authors: Jaeyeon Kim, Chanwoo Park, Asuman Ozdaglar, Jelena Diakonikolas, Ernest K. Ryu

Abstract: While first-order optimization methods are usually designed to efficiently reduce the function value $f(x)$, there has been recent interest in methods efficiently reducing the magnitude of $\nabla f(x)$, and the findings show that the two types of methods exhibit a certain symmetry. In this work, we present mirror duality, a one-to-one correspondence between mirror-descent-type methods reducing fu… ▽ More While first-order optimization methods are usually designed to efficiently reduce the function value $f(x)$, there has been recent interest in methods efficiently reducing the magnitude of $\nabla f(x)$, and the findings show that the two types of methods exhibit a certain symmetry. In this work, we present mirror duality, a one-to-one correspondence between mirror-descent-type methods reducing function value and reducing gradient magnitude. Using mirror duality, we obtain the dual accelerated mirror descent (dual-AMD) method that efficiently reduces $ψ^*(\nabla f(x))$, where $ψ$ is a distance-generating function and $ψ^*$ quantifies the magnitude of $\nabla f(x)$. We then apply dual-AMD to efficiently reduce $\|\nabla f(\cdot) \|_q$ for $q\in [2,\infty)$ and to efficiently compute $\varepsilon$-approximate solutions of the optimal transport problem. △ Less

Submitted 15 May, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

arXiv:2310.18297 [pdf, other]

Image Clustering Conditioned on Text Criteria

Authors: Sehyun Kwon, Jaeseung Park, Minkyu Kim, Jaewoong Cho, Ernest K. Ryu, Kangwook Lee

Abstract: Classical clustering methods do not provide users with direct control of the clustering results, and the clustering results may not be consistent with the relevant criterion that a user has in mind. In this work, we present a new methodology for performing image clustering based on user-specified text criteria by leveraging modern vision-language models and large language models. We call our metho… ▽ More Classical clustering methods do not provide users with direct control of the clustering results, and the clustering results may not be consistent with the relevant criterion that a user has in mind. In this work, we present a new methodology for performing image clustering based on user-specified text criteria by leveraging modern vision-language models and large language models. We call our method Image Clustering Conditioned on Text Criteria (IC|TC), and it represents a different paradigm of image clustering. IC|TC requires a minimal and practical degree of human intervention and grants the user significant control over the clustering results in return. Our experiments show that IC|TC can effectively cluster images with various criteria, such as human action, physical location, or the person's mood, while significantly outperforming baselines. △ Less

Submitted 21 February, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

arXiv:2307.02770 [pdf, other]

Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback

Authors: TaeHo Yoon, Kibeom Myoung, Keon Lee, Jaewoong Cho, Albert No, Ernest K. Ryu

Abstract: Diffusion models have recently shown remarkable success in high-quality image generation. Sometimes, however, a pre-trained diffusion model exhibits partial misalignment in the sense that the model can generate good images, but it sometimes outputs undesirable images. If so, we simply need to prevent the generation of the bad images, and we call this task censoring. In this work, we present censor… ▽ More Diffusion models have recently shown remarkable success in high-quality image generation. Sometimes, however, a pre-trained diffusion model exhibits partial misalignment in the sense that the model can generate good images, but it sometimes outputs undesirable images. If so, we simply need to prevent the generation of the bad images, and we call this task censoring. In this work, we present censored generation with a pre-trained diffusion model using a reward model trained on minimal human feedback. We show that censoring can be accomplished with extreme human feedback efficiency and that labels generated with a mere few minutes of human feedback are sufficient. Code available at: https://github.com/tetrzim/diffusion-human-feedback. △ Less

Submitted 30 October, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: Published in NeurIPS 2023

arXiv:2305.16569 [pdf, ps, other]

Accelerating Value Iteration with Anchoring

Authors: Jongmin Lee, Ernest K. Ryu

Abstract: Value Iteration (VI) is foundational to the theory and practice of modern reinforcement learning, and it is known to converge at a $\mathcal{O}(γ^k)$-rate, where $γ$ is the discount factor. Surprisingly, however, the optimal rate for the VI setup was not known, and finding a general acceleration mechanism has been an open problem. In this paper, we present the first accelerated VI for both the Bel… ▽ More Value Iteration (VI) is foundational to the theory and practice of modern reinforcement learning, and it is known to converge at a $\mathcal{O}(γ^k)$-rate, where $γ$ is the discount factor. Surprisingly, however, the optimal rate for the VI setup was not known, and finding a general acceleration mechanism has been an open problem. In this paper, we present the first accelerated VI for both the Bellman consistency and optimality operators. Our method, called Anc-VI, is based on an \emph{anchoring} mechanism (distinct from Nesterov's acceleration), and it reduces the Bellman error faster than standard VI. In particular, Anc-VI exhibits a $\mathcal{O}(1/k)$-rate for $γ\approx 1$ or even $γ=1$, while standard VI has rate $\mathcal{O}(1)$ for $γ\ge 1-1/k$, where $k$ is the iteration count. We also provide a complexity lower bound matching the upper bound up to a constant factor of $4$, thereby establishing optimality of the accelerated rate of Anc-VI. Finally, we show that the anchoring mechanism provides the same benefit in the approximate VI and Gauss--Seidel VI setups as well. △ Less

Submitted 28 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Journal ref: Neural Information Processing System 2023

arXiv:2305.15704 [pdf, ps, other]

Computer-Assisted Design of Accelerated Composite Optimization Methods: OptISTA

Authors: Uijeong Jang, Shuvomoy Das Gupta, Ernest K. Ryu

Abstract: The accelerated composite optimization method FISTA (Beck, Teboulle 2009) is suboptimal, and we present a new method OptISTA that improves upon it by a factor of 2. The performance estimation problem (PEP) has recently been introduced as a new computer-assisted paradigm for designing optimal first-order methods, but the methodology was largely limited to unconstrained optimization with a single fu… ▽ More The accelerated composite optimization method FISTA (Beck, Teboulle 2009) is suboptimal, and we present a new method OptISTA that improves upon it by a factor of 2. The performance estimation problem (PEP) has recently been introduced as a new computer-assisted paradigm for designing optimal first-order methods, but the methodology was largely limited to unconstrained optimization with a single function. In this work, we present a novel double-function stepsize-optimization PEP methodology that poses the optimization over fixed-step first-order methods for composite optimization as a finite-dimensional nonconvex QCQP, which can be practically solved through spatial branch-and-bound algorithms, and use it to design the exact optimal method OptISTA for the composite optimization setup. We then establish the exact optimality of OptISTA with a novel lower-bound construction that extends the semi-interpolated zero-chain construction (Drori, Taylor 2022) to the double-function setup of composite optimization. By establishing exact optimality, our work concludes the search for the fastest first-order methods for the proximal, projected-gradient, and proximal-gradient setups. △ Less

Submitted 1 November, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: 54 pages. There are two major changes. 1. In our prior submission, the method termed "OPM" was identified as existing work. Consequently, we have made appropriate modifications to the manuscript. 2. The proof for Theorem 1 has been replaced with an alternative proof (Section 2.2 and Appendix C), which we believe is more intuitive

arXiv:2305.12211 [pdf, other]

Coordinate-Update Algorithms can Efficiently Detect Infeasible Optimization Problems

Authors: **hee Paeng, Jisun Park, Ernest K. Ryu

Abstract: Coordinate update/descent algorithms are widely used in large-scale optimization due to their low per-iteration cost and scalability, but their behavior on infeasible or misspecified problems has not been much studied compared to the algorithms that use full updates. For coordinate-update methods to be as widely adopted to the extent so that they can be used as engines of general-purpose solvers,… ▽ More Coordinate update/descent algorithms are widely used in large-scale optimization due to their low per-iteration cost and scalability, but their behavior on infeasible or misspecified problems has not been much studied compared to the algorithms that use full updates. For coordinate-update methods to be as widely adopted to the extent so that they can be used as engines of general-purpose solvers, it is necessary to also understand their behavior under pathological problem instances. In this work, we show that the normalized iterates of randomized coordinate-update fixed-point iterations (RC-FPI) converge to the infimal displacement vector and use this result to design an efficient infeasibility detection method. We then extend the analysis to the setup where the coordinates are defined by non-orthonormal basis using the Friedrichs angle and then apply the machinery to decentralized optimization problems. △ Less

Submitted 19 November, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

arXiv:2305.06628 [pdf, ps, other]

Time-Reversed Dissipation Induces Duality Between Minimizing Gradient Norm and Function Value

Authors: Jaeyeon Kim, Asuman Ozdaglar, Chanwoo Park, Ernest K. Ryu

Abstract: In convex optimization, first-order optimization methods efficiently minimizing function values have been a central subject study since Nesterov's seminal work of 1983. Recently, however, Kim and Fessler's OGM-G and Lee et al.'s FISTA-G have been presented as alternatives that efficiently minimize the gradient magnitude instead. In this paper, we present H-duality, which represents a surprising on… ▽ More In convex optimization, first-order optimization methods efficiently minimizing function values have been a central subject study since Nesterov's seminal work of 1983. Recently, however, Kim and Fessler's OGM-G and Lee et al.'s FISTA-G have been presented as alternatives that efficiently minimize the gradient magnitude instead. In this paper, we present H-duality, which represents a surprising one-to-one correspondence between methods efficiently minimizing function values and methods efficiently minimizing gradient magnitude. In continuous-time formulations, H-duality corresponds to reversing the time dependence of the dissipation/friction term. To the best of our knowledge, H-duality is different from Lagrange/Fenchel duality and is distinct from any previously known duality or symmetry relations. Using H-duality, we obtain a clearer understanding of the symmetry between Nesterov's method and OGM-G, derive a new class of methods efficiently reducing gradient magnitudes of smooth convex functions, and find a new composite minimization method that is simpler and faster than FISTA-G. △ Less

Submitted 31 October, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

arXiv:2304.13995 [pdf, other]

Rotation and Translation Invariant Representation Learning with Implicit Neural Representations

Authors: Sehyun Kwon, Joo Young Choi, Ernest K. Ryu

Abstract: In many computer vision applications, images are acquired with arbitrary or random rotations and translations, and in such setups, it is desirable to obtain semantic representations disentangled from the image orientation. Examples of such applications include semiconductor wafer defect inspection, plankton microscope images, and inference on single-particle cryo-electron microscopy (cryo-EM) micr… ▽ More In many computer vision applications, images are acquired with arbitrary or random rotations and translations, and in such setups, it is desirable to obtain semantic representations disentangled from the image orientation. Examples of such applications include semiconductor wafer defect inspection, plankton microscope images, and inference on single-particle cryo-electron microscopy (cryo-EM) micro-graphs. In this work, we propose Invariant Representation Learning with Implicit Neural Representation (IRL-INR), which uses an implicit neural representation (INR) with a hypernetwork to obtain semantic representations disentangled from the orientation of the image. We show that IRL-INR can effectively learn disentangled semantic representations on more complex images compared to those considered in prior works and show that these semantic representations synergize well with SCAN to produce state-of-the-art unsupervised clustering results. △ Less

Submitted 12 June, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

arXiv:2304.00771 [pdf, other]

Continuous-time Analysis of Anchor Acceleration

Authors: Jaewook J. Suh, Jisun Park, Ernest K. Ryu

Abstract: Recently, the anchor acceleration, an acceleration mechanism distinct from Nesterov's, has been discovered for minimax optimization and fixed-point problems, but its mechanism is not understood well, much less so than Nesterov acceleration. In this work, we analyze continuous-time models of anchor acceleration. We provide tight, unified analyses for characterizing the convergence rate as a functio… ▽ More Recently, the anchor acceleration, an acceleration mechanism distinct from Nesterov's, has been discovered for minimax optimization and fixed-point problems, but its mechanism is not understood well, much less so than Nesterov acceleration. In this work, we analyze continuous-time models of anchor acceleration. We provide tight, unified analyses for characterizing the convergence rate as a function of the anchor coefficient $β(t)$, thereby providing insight into the anchor acceleration mechanism and its accelerated $\mathcal{O}(1/k^2)$-convergence rate. Finally, we present an adaptive method inspired by the continuous-time analyses and establish its effectiveness through theoretical analyses and experiments. △ Less

Submitted 2 November, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

arXiv:2303.15876 [pdf, other]

Accelerated Infeasibility Detection of Constrained Optimization and Fixed-Point Iterations

Authors: Jisun Park, Ernest K. Ryu

Abstract: As first-order optimization methods become the method of choice for solving large-scale optimization problems, optimization solvers based on first-order algorithms are being built. Such general-purpose solvers must robustly detect infeasible or misspecified problem instances, but the computational complexity of first-order methods for doing so has yet to be formally studied. In this work, we chara… ▽ More As first-order optimization methods become the method of choice for solving large-scale optimization problems, optimization solvers based on first-order algorithms are being built. Such general-purpose solvers must robustly detect infeasible or misspecified problem instances, but the computational complexity of first-order methods for doing so has yet to be formally studied. In this work, we characterize the optimal accelerated rate of infeasibility detection. We show that the standard fixed-point iteration achieves a $\mathcal{O}(1/k^2)$ and $\mathcal{O}(1/k)$ rates, respectively, on the normalized iterates and the fixed-point residual converging to the infimal displacement vector, while the accelerated fixed-point iteration achieves $\mathcal{O}(1/k^2)$ and $\tilde{\mathcal{O}}(1/k^2)$ rates. We then provide a matching complexity lower bound to establish that $Θ(1/k^2)$ is indeed the optimal accelerated rate. △ Less

Submitted 28 March, 2023; originally announced March 2023.

arXiv:2302.03239 [pdf, ps, other]

Calibrated Recommendations for Users with Decaying Attention

Authors: Jon Kleinberg, Emily Ryu, Éva Tardos

Abstract: Recommendation systems capable of providing diverse sets of results are a focus of increasing importance, with motivations ranging from fairness to novelty and other aspects of optimizing user experience. One form of diversity of recent interest is calibration, the notion that personalized recommendations should reflect the full distribution of a user's interests, rather than a single predominant… ▽ More Recommendation systems capable of providing diverse sets of results are a focus of increasing importance, with motivations ranging from fairness to novelty and other aspects of optimizing user experience. One form of diversity of recent interest is calibration, the notion that personalized recommendations should reflect the full distribution of a user's interests, rather than a single predominant category -- for instance, a user who mainly reads entertainment news but also wants to keep up with news on the environment and the economy would prefer to see a mixture of these genres, not solely entertainment news. Existing work has formulated calibration as a subset selection problem; this line of work observes that the formulation requires the unrealistic assumption that all recommended items receive equal consideration from the user, but leaves as an open question the more realistic setting in which user attention decays as they move down the list of results. In this paper, we consider calibration with decaying user attention under two different models. In both models, there is a set of underlying genres that items can belong to. In the first setting, where items are represented by fine-grained mixtures of genre percentages, we provide a $(1-1/e)$-approximation algorithm by extending techniques for constrained submodular optimization. In the second setting, where items are coarsely binned into a single genre each, we surpass the $(1-1/e)$ barrier imposed by submodular maximization and give a $2/3$-approximate greedy algorithm. Our work thus addresses the problem of capturing ordering effects due to decaying attention, allowing for the extension of near-optimal calibration from recommendation sets to recommendation lists. △ Less

Submitted 6 February, 2023; originally announced February 2023.

Comments: 24 pages, 1 figure. This paper incorporates and supersedes our earlier paper arXiv:2203.00233

arXiv:2211.15604 [pdf, other]

Convergence Analyses of Davis-Yin Splitting via Scaled Relative Graphs II: Convex Optimization Problems

Authors: Soheun Yi, Ernest K. Ryu

Abstract: The prior work of [arXiv:2207.04015, 2022] used scaled relative graphs (SRG) to analyze the convergence of Davis-Yin splitting (DYS) iterations on monotone inclusion problems. In this work, we use this machinery to analyze DYS iterations on convex optimization problems and obtain state-of-the-art linear convergence rates. The prior work of [arXiv:2207.04015, 2022] used scaled relative graphs (SRG) to analyze the convergence of Davis-Yin splitting (DYS) iterations on monotone inclusion problems. In this work, we use this machinery to analyze DYS iterations on convex optimization problems and obtain state-of-the-art linear convergence rates. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2207.04015 [pdf, other]

Convergence Analyses of Davis-Yin Splitting via Scaled Relative Graphs

Authors: Jongmin Lee, Soheun Yi, Ernest K. Ryu

Abstract: Davis-Yin splitting (DYS) has found a wide range of applications in optimization, but its linear rates of convergence have not been studied extensively. The scaled relative graph (SRG) simplifies the convergence analysis of operator splitting methods by map** the action of the operator onto the complex plane, but the prior SRG theory did not fully apply to the DYS operator. In this work, we form… ▽ More Davis-Yin splitting (DYS) has found a wide range of applications in optimization, but its linear rates of convergence have not been studied extensively. The scaled relative graph (SRG) simplifies the convergence analysis of operator splitting methods by map** the action of the operator onto the complex plane, but the prior SRG theory did not fully apply to the DYS operator. In this work, we formalize an SRG theory for the DYS operator and use it to obtain tighter contraction factors. △ Less

Submitted 21 April, 2024; v1 submitted 8 July, 2022; originally announced July 2022.

arXiv:2205.11093 [pdf, other]

Accelerated Minimax Algorithms Flock Together

Authors: TaeHo Yoon, Ernest K. Ryu

Abstract: Several new accelerated methods in minimax optimization and fixed-point iterations have recently been discovered, and, interestingly, they rely on a mechanism distinct from Nesterov's momentum-based acceleration. In this work, we show that these accelerated algorithms exhibit what we call the merging path (MP) property; the trajectories of these algorithms merge quickly. Using this novel MP proper… ▽ More Several new accelerated methods in minimax optimization and fixed-point iterations have recently been discovered, and, interestingly, they rely on a mechanism distinct from Nesterov's momentum-based acceleration. In this work, we show that these accelerated algorithms exhibit what we call the merging path (MP) property; the trajectories of these algorithms merge quickly. Using this novel MP property, we establish point convergence of existing accelerated minimax algorithms and derive new state-of-the-art algorithms for the strongly-convex-strongly-concave setup and for the prox-grad setup. △ Less

Submitted 23 May, 2022; originally announced May 2022.

arXiv:2203.07305 [pdf, other]

Branch-and-Bound Performance Estimation Programming: A Unified Methodology for Constructing Optimal Optimization Methods

Authors: Shuvomoy Das Gupta, Bart P. G. Van Parys, Ernest K. Ryu

Abstract: We present the Branch-and-Bound Performance Estimation Programming (BnB-PEP), a unified methodology for constructing optimal first-order methods for convex and nonconvex optimization. BnB-PEP poses the problem of finding the optimal optimization method as a nonconvex but practically tractable quadratically constrained quadratic optimization problem and solves it to certifiable global optimality us… ▽ More We present the Branch-and-Bound Performance Estimation Programming (BnB-PEP), a unified methodology for constructing optimal first-order methods for convex and nonconvex optimization. BnB-PEP poses the problem of finding the optimal optimization method as a nonconvex but practically tractable quadratically constrained quadratic optimization problem and solves it to certifiable global optimality using a customized branch-and-bound algorithm. By directly confronting the nonconvexity, BnB-PEP offers significantly more flexibility and removes the many limitations of the prior methodologies. Our customized branch-and-bound algorithm, through exploiting specific problem structures, outperforms the latest off-the-shelf implementations by orders of magnitude, accelerating the solution time from hours to seconds and weeks to minutes. We apply BnB-PEP to several setups for which the prior methodologies do not apply and obtain methods with bounds that improve upon prior state-of-the-art results. Finally, we use the BnB-PEP methodology to find proofs with potential function structures, thereby systematically generating analytical convergence proofs. △ Less

Submitted 8 June, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

Comments: Published in Mathematical Programming Series A

arXiv:2203.00233 [pdf, ps, other]

Ordered Submodularity and its Applications to Diversifying Recommendations

Authors: Jon Kleinberg, Emily Ryu, Éva Tardos

Abstract: A fundamental task underlying many important optimization problems, from influence maximization to sensor placement to content recommendation, is to select the optimal group of $k$ items from a larger set. Submodularity has been very effective in allowing approximation algorithms for such subset selection problems. However, in several applications, we are interested not only in the elements of a s… ▽ More A fundamental task underlying many important optimization problems, from influence maximization to sensor placement to content recommendation, is to select the optimal group of $k$ items from a larger set. Submodularity has been very effective in allowing approximation algorithms for such subset selection problems. However, in several applications, we are interested not only in the elements of a set, but also the order in which they appear, breaking the assumption that all selected items receive equal consideration. One such category of applications involves the presentation of search results, product recommendations, news articles, and other content, due to the well-documented phenomenon that humans pay greater attention to higher-ranked items. As a result, optimization in content presentation for diversity, user coverage, calibration, or other objectives more accurately represents a sequence selection problem, to which traditional submodularity approximation results no longer apply. Although extensions of submodularity to sequences have been proposed, none is designed to model settings where items contribute based on their position in a ranked list, and hence they are not able to express these types of optimization problems. In this paper, we aim to address this modeling gap. Here, we propose a new formalism of ordered submodularity that captures these ordering problems in content presentation, and more generally a category of optimization problems over ranked sequences in which different list positions contribute differently to the objective function. We analyze the natural ordered analogue of the greedy algorithm and show that it provides a $2$-approximation. We also show that this bound is tight, establishing that our new framework is conceptually and quantitatively distinct from previous formalisms of set and sequence submodularity. △ Less

Submitted 1 March, 2022; originally announced March 2022.

Comments: 17 pages

arXiv:2202.11910 [pdf, other]

Robust Probabilistic Time Series Forecasting

Authors: TaeHo Yoon, Youngsuk Park, Ernest K. Ryu, Yuyang Wang

Abstract: Probabilistic time series forecasting has played critical role in decision-making processes due to its capability to quantify uncertainties. Deep forecasting models, however, could be prone to input perturbations, and the notion of such perturbations, together with that of robustness, has not even been completely established in the regime of probabilistic forecasting. In this work, we propose a fr… ▽ More Probabilistic time series forecasting has played critical role in decision-making processes due to its capability to quantify uncertainties. Deep forecasting models, however, could be prone to input perturbations, and the notion of such perturbations, together with that of robustness, has not even been completely established in the regime of probabilistic forecasting. In this work, we propose a framework for robust probabilistic time series forecasting. First, we generalize the concept of adversarial input perturbations, based on which we formulate the concept of robustness in terms of bounded Wasserstein deviation. Then we extend the randomized smoothing technique to attain robust probabilistic forecasters with theoretical robustness certificates against certain classes of adversarial perturbations. Lastly, extensive experiments demonstrate that our methods are empirically effective in enhancing the forecast quality under additive adversarial attacks and forecast consistency under supplement of noisy observations. △ Less

Submitted 24 February, 2022; originally announced February 2022.

Comments: AISTATS 2022 camera ready version

arXiv:2202.05501 [pdf, ps, other]

Continuous-Time Analysis of Accelerated Gradient Methods via Conservation Laws in Dilated Coordinate Systems

Authors: Jaewook J. Suh, Gyumin Roh, Ernest K. Ryu

Abstract: We analyze continuous-time models of accelerated gradient methods through deriving conservation laws in dilated coordinate systems. Namely, instead of analyzing the dynamics of $X(t)$, we analyze the dynamics of $W(t)=t^α(X(t)-X_c)$ for some $α$ and $X_c$ and derive a conserved quantity, analogous to physical energy, in this dilated coordinate system. Through this methodology, we recover many know… ▽ More We analyze continuous-time models of accelerated gradient methods through deriving conservation laws in dilated coordinate systems. Namely, instead of analyzing the dynamics of $X(t)$, we analyze the dynamics of $W(t)=t^α(X(t)-X_c)$ for some $α$ and $X_c$ and derive a conserved quantity, analogous to physical energy, in this dilated coordinate system. Through this methodology, we recover many known continuous-time analyses in a streamlined manner and obtain novel continuous-time analyses for OGM-G, an acceleration mechanism for efficiently reducing gradient magnitude that is distinct from that of Nesterov. Finally, we show that a semi-second-order symplectic Euler discretization in the dilated coordinate system leads to an $\mathcal{O}(1/k^2)$ rate on the standard setup of smooth convex minimization, without any further assumptions such as infinite differentiability. △ Less

Submitted 24 June, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

arXiv:2202.02981 [pdf, other]

Neural Tangent Kernel Analysis of Deep Narrow Neural Networks

Authors: Jongmin Lee, Joo Young Choi, Ernest K. Ryu, Albert No

Abstract: The tremendous recent progress in analyzing the training dynamics of overparameterized neural networks has primarily focused on wide networks and therefore does not sufficiently address the role of depth in deep learning. In this work, we present the first trainability guarantee of infinitely deep but narrow neural networks. We study the infinite-depth limit of a multilayer perceptron (MLP) with a… ▽ More The tremendous recent progress in analyzing the training dynamics of overparameterized neural networks has primarily focused on wide networks and therefore does not sufficiently address the role of depth in deep learning. In this work, we present the first trainability guarantee of infinitely deep but narrow neural networks. We study the infinite-depth limit of a multilayer perceptron (MLP) with a specific initialization and establish a trainability guarantee using the NTK theory. We then extend the analysis to an infinitely deep convolutional neural network (CNN) and perform brief experiments. △ Less

Submitted 27 June, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

Journal ref: Published in International Conference on Machine Learning, 2022

arXiv:2201.11413 [pdf, other]

Exact Optimal Accelerated Complexity for Fixed-Point Iterations

Authors: Jisun Park, Ernest K. Ryu

Abstract: Despite the broad use of fixed-point iterations throughout applied mathematics, the optimal convergence rate of general fixed-point problems with nonexpansive nonlinear operators has not been established. This work presents an acceleration mechanism for fixed-point iterations with nonexpansive operators, contractive operators, and nonexpansive operators satisfying a Hölder-type growth condition. W… ▽ More Despite the broad use of fixed-point iterations throughout applied mathematics, the optimal convergence rate of general fixed-point problems with nonexpansive nonlinear operators has not been established. This work presents an acceleration mechanism for fixed-point iterations with nonexpansive operators, contractive operators, and nonexpansive operators satisfying a Hölder-type growth condition. We then provide matching complexity lower bounds to establish the exact optimality of the acceleration mechanisms in the nonexpansive and contractive setups. Finally, we provide experiments with CT imaging, optimal transport, and decentralized optimization to demonstrate the practical effectiveness of the acceleration mechanism. △ Less

Submitted 27 June, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

Comments: ICML 2022 Long Talk

arXiv:2201.09077 [pdf, other]

LTC-GIF: Attracting More Clicks on Feature-length Sports Videos

Authors: Ghulam Mujtaba, Jaehyuk Choi, Eun-Seok Ryu

Abstract: This paper proposes a lightweight method to attract users and increase views of the video by presenting personalized artistic media -- i.e, static thumbnails and animated GIFs. This method analyzes lightweight thumbnail containers (LTC) using computational resources of the client device to recognize personalized events from full-length sports videos. In addition, instead of processing the entire v… ▽ More This paper proposes a lightweight method to attract users and increase views of the video by presenting personalized artistic media -- i.e, static thumbnails and animated GIFs. This method analyzes lightweight thumbnail containers (LTC) using computational resources of the client device to recognize personalized events from full-length sports videos. In addition, instead of processing the entire video, small video segments are processed to generate artistic media. This makes the proposed approach more computationally efficient compared to the baseline approaches that create artistic media using the entire video. The proposed method retrieves and uses thumbnail containers and video segments, which reduces the required transmission bandwidth as well as the amount of locally stored data used during artistic media generation. When extensive experiments were conducted on the Nvidia Jetson TX2, the computational complexity of the proposed method was 3.57 times lower than that of the SoA method. In the qualitative assessment, GIFs generated using the proposed method received 1.02 higher overall ratings compared to the SoA method. To the best of our knowledge, this is the first technique that uses LTC to generate artistic media while providing lightweight and high-performance services even on resource-constrained devices. △ Less

Submitted 22 January, 2022; originally announced January 2022.

arXiv:2201.09049 [pdf, other]

doi 10.1109/ACCESS.2022.3209275

LTC-SUM: Lightweight Client-driven Personalized Video Summarization Framework Using 2D CNN

Authors: Ghulam Mujtaba, Adeel Malik, Eun-Seok Ryu

Abstract: This paper proposes a novel lightweight thumbnail container-based summarization (LTC-SUM) framework for full feature-length videos. This framework generates a personalized keyshot summary for concurrent users by using the computational resource of the end-user device. State-of-the-art methods that acquire and process entire video data to generate video summaries are highly computationally intensiv… ▽ More This paper proposes a novel lightweight thumbnail container-based summarization (LTC-SUM) framework for full feature-length videos. This framework generates a personalized keyshot summary for concurrent users by using the computational resource of the end-user device. State-of-the-art methods that acquire and process entire video data to generate video summaries are highly computationally intensive. In this regard, the proposed LTC-SUM method uses lightweight thumbnails to handle the complex process of detecting events. This significantly reduces computational complexity and improves communication and storage efficiency by resolving computational and privacy bottlenecks in resource-constrained end-user devices. These improvements were achieved by designing a lightweight 2D CNN model to extract features from thumbnails, which helped select and retrieve only a handful of specific segments. Extensive quantitative experiments on a set of full 18 feature-length videos (approximately 32.9 h in duration) showed that the proposed method is significantly computationally efficient than state-of-the-art methods on the same end-user device configurations. Joint qualitative assessments of the results of 56 participants showed that participants gave higher ratings to the summaries generated using the proposed method. To the best of our knowledge, this is the first attempt in designing a fully client-driven personalized keyshot video summarization framework using thumbnail containers for feature-length videos. △ Less

Submitted 4 October, 2022; v1 submitted 22 January, 2022; originally announced January 2022.

Comments: 14

Journal ref: in IEEE Access, vol. 10, pp. 103041-103055, 2022

arXiv:2112.09379 [pdf]

Enhanced Frame and Event-Based Simulator and Event-Based Video Interpolation Network

Authors: Adam Radomski, Andreas Georgiou, Thomas Debrunner, Chenghan Li, Luca Longinotti, Minwon Seo, Moosung Kwak, Chang-Woo Shin, Paul K. J. Park, Hyunsurk Eric Ryu, Kynan Eng

Abstract: Fast neuromorphic event-based vision sensors (Dynamic Vision Sensor, DVS) can be combined with slower conventional frame-based sensors to enable higher-quality inter-frame interpolation than traditional methods relying on fixed motion approximations using e.g. optical flow. In this work we present a new, advanced event simulator that can produce realistic scenes recorded by a camera rig with an ar… ▽ More Fast neuromorphic event-based vision sensors (Dynamic Vision Sensor, DVS) can be combined with slower conventional frame-based sensors to enable higher-quality inter-frame interpolation than traditional methods relying on fixed motion approximations using e.g. optical flow. In this work we present a new, advanced event simulator that can produce realistic scenes recorded by a camera rig with an arbitrary number of sensors located at fixed offsets. It includes a new configurable frame-based image sensor model with realistic image quality reduction effects, and an extended DVS model with more accurate characteristics. We use our simulator to train a novel reconstruction model designed for end-to-end reconstruction of high-fps video. Unlike previously published methods, our method does not require the frame and DVS cameras to have the same optics, positions, or camera resolutions. It is also not limited to objects a fixed distance from the sensor. We show that data generated by our simulator can be used to train our new model, leading to reconstructed images on public datasets of equivalent or better quality than the state of the art. We also show our sensor generalizing to data recorded by real sensors. △ Less

Submitted 17 December, 2021; originally announced December 2021.

Comments: 10 pages, 19 figures

arXiv:2110.11035 [pdf, ps, other]

Optimal First-Order Algorithms as a Function of Inequalities

Authors: Chanwoo Park, Ernest K. Ryu

Abstract: In this work, we present a novel algorithm design methodology that finds the optimal algorithm as a function of inequalities. Specifically, we restrict convergence analyses of algorithms to use a prespecified subset of inequalities, rather than utilizing all true inequalities, and find the optimal algorithm subject to this restriction. This methodology allows us to design algorithms with certain d… ▽ More In this work, we present a novel algorithm design methodology that finds the optimal algorithm as a function of inequalities. Specifically, we restrict convergence analyses of algorithms to use a prespecified subset of inequalities, rather than utilizing all true inequalities, and find the optimal algorithm subject to this restriction. This methodology allows us to design algorithms with certain desired characteristics. As concrete demonstrations of this methodology, we find new state-of-the-art accelerated first-order gradient methods using randomized coordinate updates and backtracking line searches. △ Less

Submitted 21 March, 2024; v1 submitted 21 October, 2021; originally announced October 2021.

arXiv:2106.10439 [pdf, other]

A Geometric Structure of Acceleration and Its Role in Making Gradients Small Fast

Authors: Jongmin Lee, Chanwoo Park, Ernest K. Ryu

Abstract: Since Nesterov's seminal 1983 work, many accelerated first-order optimization methods have been proposed, but their analyses lacks a common unifying structure. In this work, we identify a geometric structure satisfied by a wide range of first-order accelerated methods. Using this geometric insight, we present several novel generalizations of accelerated methods. Most interesting among them is a me… ▽ More Since Nesterov's seminal 1983 work, many accelerated first-order optimization methods have been proposed, but their analyses lacks a common unifying structure. In this work, we identify a geometric structure satisfied by a wide range of first-order accelerated methods. Using this geometric insight, we present several novel generalizations of accelerated methods. Most interesting among them is a method that reduces the squared gradient norm with $\mathcal{O}(1/K^4)$ rate in the prox-grad setup, faster than the $\mathcal{O}(1/K^3)$ rates of Nesterov's FGM or Kim and Fessler's FPGM-m. △ Less

Submitted 4 November, 2021; v1 submitted 19 June, 2021; originally announced June 2021.

Journal ref: Published in the Neural Information Processing Systems, 2021

arXiv:2104.09644 [pdf]

Neural Language Models with Distant Supervision to Identify Major Depressive Disorder from Clinical Notes

Authors: Bhavani Singh Agnikula Kshatriya, Nicolas A Nunez, Manuel Gardea- Resendez, Euijung Ryu, Brandon J Coombes, Sunyang Fu, Mark A Frye, Joanna M Biernacka, Yanshan Wang

Abstract: Major depressive disorder (MDD) is a prevalent psychiatric disorder that is associated with significant healthcare burden worldwide. Phenoty** of MDD can help early diagnosis and consequently may have significant advantages in patient management. In prior research MDD phenotypes have been extracted from structured Electronic Health Records (EHR) or using Electroencephalographic (EEG) data with t… ▽ More Major depressive disorder (MDD) is a prevalent psychiatric disorder that is associated with significant healthcare burden worldwide. Phenoty** of MDD can help early diagnosis and consequently may have significant advantages in patient management. In prior research MDD phenotypes have been extracted from structured Electronic Health Records (EHR) or using Electroencephalographic (EEG) data with traditional machine learning models to predict MDD phenotypes. However, MDD phenotypic information is also documented in free-text EHR data, such as clinical notes. While clinical notes may provide more accurate phenoty** information, natural language processing (NLP) algorithms must be developed to abstract such information. Recent advancements in NLP resulted in state-of-the-art neural language models, such as Bidirectional Encoder Representations for Transformers (BERT) model, which is a transformer-based model that can be pre-trained from a corpus of unsupervised text data and then fine-tuned on specific tasks. However, such neural language models have been underutilized in clinical NLP tasks due to the lack of large training datasets. In the literature, researchers have utilized the distant supervision paradigm to train machine learning models on clinical text classification tasks to mitigate the issue of lacking annotated training data. It is still unknown whether the paradigm is effective for neural language models. In this paper, we propose to leverage the neural language models in a distant supervision paradigm to identify MDD phenotypes from clinical notes. The experimental results indicate that our proposed approach is effective in identifying MDD phenotypes and that the Bio- Clinical BERT, a specific BERT model for clinical data, achieved the best performance in comparison with conventional machine learning models. △ Less

Submitted 19 April, 2021; originally announced April 2021.

arXiv:2102.07922 [pdf, other]

Accelerated Algorithms for Smooth Convex-Concave Minimax Problems with $\mathcal{O}(1/k^2)$ Rate on Squared Gradient Norm

Authors: TaeHo Yoon, Ernest K. Ryu

Abstract: In this work, we study the computational complexity of reducing the squared gradient magnitude for smooth minimax optimization problems. First, we present algorithms with accelerated $\mathcal{O}(1/k^2)$ last-iterate rates, faster than the existing $\mathcal{O}(1/k)$ or slower rates for extragradient, Popov, and gradient descent with anchoring. The acceleration mechanism combines extragradient ste… ▽ More In this work, we study the computational complexity of reducing the squared gradient magnitude for smooth minimax optimization problems. First, we present algorithms with accelerated $\mathcal{O}(1/k^2)$ last-iterate rates, faster than the existing $\mathcal{O}(1/k)$ or slower rates for extragradient, Popov, and gradient descent with anchoring. The acceleration mechanism combines extragradient steps with anchoring and is distinct from Nesterov's acceleration. We then establish optimality of the $ \mathcal{O}(1/k^2)$ rate through a matching lower bound. △ Less

Submitted 10 June, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

Comments: Published at ICML 2021 as a long talk

arXiv:2102.07541 [pdf, other]

WGAN with an Infinitely Wide Generator Has No Spurious Stationary Points

Authors: Albert No, TaeHo Yoon, Sehyun Kwon, Ernest K. Ryu

Abstract: Generative adversarial networks (GAN) are a widely used class of deep generative models, but their minimax training dynamics are not understood very well. In this work, we show that GANs with a 2-layer infinite-width generator and a 2-layer finite-width discriminator trained with stochastic gradient ascent-descent have no spurious stationary points. We then show that when the width of the generato… ▽ More Generative adversarial networks (GAN) are a widely used class of deep generative models, but their minimax training dynamics are not understood very well. In this work, we show that GANs with a 2-layer infinite-width generator and a 2-layer finite-width discriminator trained with stochastic gradient ascent-descent have no spurious stationary points. We then show that when the width of the generator is finite but wide, there are no spurious stationary points within a ball whose radius becomes arbitrarily large (to cover the entire parameter space) as the width goes to infinity. △ Less

Submitted 9 June, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

Comments: Published at ICML 2021

arXiv:2102.07366 [pdf, ps, other]

Factor-$\sqrt{2}$ Acceleration of Accelerated Gradient Methods

Authors: Chanwoo Park, Jisun Park, Ernest K. Ryu

Abstract: The optimized gradient method (OGM) provides a factor-$\sqrt{2}$ speedup upon Nesterov's celebrated accelerated gradient method in the convex (but non-strongly convex) setup. However, this improved acceleration mechanism has not been well understood; prior analyses of OGM relied on a computer-assisted proof methodology, so the proofs were opaque for humans despite being verifiable and correct. In… ▽ More The optimized gradient method (OGM) provides a factor-$\sqrt{2}$ speedup upon Nesterov's celebrated accelerated gradient method in the convex (but non-strongly convex) setup. However, this improved acceleration mechanism has not been well understood; prior analyses of OGM relied on a computer-assisted proof methodology, so the proofs were opaque for humans despite being verifiable and correct. In this work, we present a new analysis of OGM based on a Lyapunov function and linear coupling. These analyses are developed and presented without the assistance of computers and are understandable by humans. Furthermore, we generalize OGM's acceleration mechanism and obtain a factor-$\sqrt{2}$ speedup in other setups: acceleration with a simpler rational stepsize, the strongly convex setup, and the mirror descent setup. △ Less

Submitted 24 May, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

arXiv:2001.02061 [pdf, ps, other]

Scaled Relative Graph of Normal Matrices

Authors: Xinmeng Huang, Ernest K. Ryu, Wotao Yin

Abstract: The Scaled Relative Graph (SRG) by Ryu, Hannah, and Yin (arXiv:1902.09788, 2019) is a geometric tool that maps the action of a multi-valued nonlinear operator onto the 2D plane, used to analyze the convergence of a wide range of iterative methods. As the SRG includes the spectrum for linear operators, we can view the SRG as a generalization of the spectrum to multi-valued nonlinear operators. In t… ▽ More The Scaled Relative Graph (SRG) by Ryu, Hannah, and Yin (arXiv:1902.09788, 2019) is a geometric tool that maps the action of a multi-valued nonlinear operator onto the 2D plane, used to analyze the convergence of a wide range of iterative methods. As the SRG includes the spectrum for linear operators, we can view the SRG as a generalization of the spectrum to multi-valued nonlinear operators. In this work, we further study the SRG of linear operators and characterize the SRG of block-diagonal and normal matrices. △ Less

Submitted 8 January, 2020; v1 submitted 27 December, 2019; originally announced January 2020.

arXiv:1912.01593 [pdf, ps, other]

Tight Coefficients of Averaged Operators via Scaled Relative Graph

Authors: Xinmeng Huang, Ernest K. Ryu, Wotao Yin

Abstract: Many iterative methods in optimization are fixed-point iterations with averaged operators. As such methods converge at an $\mathcal{O}(1/k)$ rate with the constant determined by the averagedness coefficient, establishing small averagedness coefficients for operators is of broad interest. In this paper, we show that the averagedness coefficients of the composition of averaged operators by Ogura and… ▽ More Many iterative methods in optimization are fixed-point iterations with averaged operators. As such methods converge at an $\mathcal{O}(1/k)$ rate with the constant determined by the averagedness coefficient, establishing small averagedness coefficients for operators is of broad interest. In this paper, we show that the averagedness coefficients of the composition of averaged operators by Ogura and Yamada (Numer Func Anal Opt 32(1--2):113--137, 2002) and the three-operator splitting by Davis and Yin (Set-Valued Var Anal 25(4):829--858, 2017) are tight. The analysis relies on the scaled relative graph, a geometric tool recently proposed by Ryu, Hannah, and Yin (arXiv:1902.09788, 2019). △ Less

Submitted 27 April, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

arXiv:1909.09747 [pdf, ps, other]

Finding the forward-Douglas-Rachford-forward method

Authors: Ernest K. Ryu, Bang Cong Vu

Abstract: We consider the monotone inclusion problem with a sum of 3 operators, in which 2 are monotone and 1 is monotone-Lipschitz. The classical Douglas--Rachford and Forward-backward-forward methods respectively solve the monotone inclusion problem with a sum of 2 monotone operators and a sum of 1 monotone and 1 monotone-Lipschitz operators. We first present a method that naturally combines Douglas--Rach… ▽ More We consider the monotone inclusion problem with a sum of 3 operators, in which 2 are monotone and 1 is monotone-Lipschitz. The classical Douglas--Rachford and Forward-backward-forward methods respectively solve the monotone inclusion problem with a sum of 2 monotone operators and a sum of 1 monotone and 1 monotone-Lipschitz operators. We first present a method that naturally combines Douglas--Rachford and Forward-backward-forward and show that it solves the 3 operator problem under further assumptions, but fails in general. We then present a method that naturally combines Douglas--Rachford and forward-reflected-backward, a recently proposed alternative to Forward-backward-forward by Malitsky and Tam [arXiv:1808.04162, 2018]. We show that this second method solves the 3 operator problem generally, without further assumptions. △ Less

Submitted 16 October, 2019; v1 submitted 20 September, 2019; originally announced September 2019.

Comments: To appear in Journal of Optimization Theory and Applications

MSC Class: 47H05; 47H09; 90C25

arXiv:1909.06479 [pdf, other]

Decentralized Proximal Gradient Algorithms with Linear Convergence Rates

Authors: Sulaiman A. Alghunaim, Ernest K. Ryu, Kun Yuan, Ali H. Sayed

Abstract: This work studies a class of non-smooth decentralized multi-agent optimization problems where the agents aim at minimizing a sum of local strongly-convex smooth components plus a common non-smooth term. We propose a general primal-dual algorithmic framework that unifies many existing state-of-the-art algorithms. We establish linear convergence of the proposed method to the exact solution in the pr… ▽ More This work studies a class of non-smooth decentralized multi-agent optimization problems where the agents aim at minimizing a sum of local strongly-convex smooth components plus a common non-smooth term. We propose a general primal-dual algorithmic framework that unifies many existing state-of-the-art algorithms. We establish linear convergence of the proposed method to the exact solution in the presence of the non-smooth term. Moreover, for the more general class of problems with agent specific non-smooth terms, we show that linear convergence cannot be achieved (in the worst case) for the class of algorithms that uses the gradients and the proximal map**s of the smooth and non-smooth parts, respectively. We further provide a numerical counterexample that shows how some state-of-the-art algorithms fail to converge linearly for strongly-convex objectives and different local non-smooth terms. △ Less

Submitted 9 July, 2020; v1 submitted 13 September, 2019; originally announced September 2019.

Comments: To appear in IEEE Transactions on Automatic Control

arXiv:1906.12141 [pdf, other]

MGOS: A Library for Molecular Geometry and its Operating System

Authors: Deok-Soo Kima, Joonghyun Ryua, Youngsong Choa, Mokwon Leeb, Jehyun Cha, Chanyoung Song, Sangwha Kim, Roman A Laskowskid, Kokichi Sugihara, Jong Bhak, Seong Eon Ryu

Abstract: The geometry of atomic arrangement underpins the structural understanding of molecules in many fields. However, no general framework of mathematical/computational theory for the geometry of atomic arrangement exists. Here we present "Molecular Geometry (MG)" as a theoretical framework accompanied by "MG Operating System (MGOS)" which consists of callable functions implementing the MG theory. MG al… ▽ More The geometry of atomic arrangement underpins the structural understanding of molecules in many fields. However, no general framework of mathematical/computational theory for the geometry of atomic arrangement exists. Here we present "Molecular Geometry (MG)" as a theoretical framework accompanied by "MG Operating System (MGOS)" which consists of callable functions implementing the MG theory. MG allows researchers to model complicated molecular structure problems in terms of elementary yet standard notions of volume, area, etc. and MGOS frees them from the hard and tedious task of develo**/implementing geometric algorithms so that they can focus more on their primary research issues. MG facilitates simpler modeling of molecular structure problems; MGOS functions can be conveniently embedded in application programs for the efficient and accurate solution of geometric queries involving atomic arrangements. The use of MGOS in problems involving spherical entities is akin to the use of math libraries in general purpose programming languages in science and engineering. △ Less

Submitted 28 June, 2019; originally announced June 2019.

arXiv:1905.10899 [pdf, other]

ODE Analysis of Stochastic Gradient Methods with Optimism and Anchoring for Minimax Problems

Authors: Ernest K. Ryu, Kun Yuan, Wotao Yin

Abstract: Despite remarkable empirical success, the training dynamics of generative adversarial networks (GAN), which involves solving a minimax game using stochastic gradients, is still poorly understood. In this work, we analyze last-iterate convergence of simultaneous gradient descent (simGD) and its variants under the assumption of convex-concavity, guided by a continuous-time analysis with differential… ▽ More Despite remarkable empirical success, the training dynamics of generative adversarial networks (GAN), which involves solving a minimax game using stochastic gradients, is still poorly understood. In this work, we analyze last-iterate convergence of simultaneous gradient descent (simGD) and its variants under the assumption of convex-concavity, guided by a continuous-time analysis with differential equations. First, we show that simGD, as is, converges with stochastic sub-gradients under strict convexity in the primal variable. Second, we generalize optimistic simGD to accommodate an optimism rate separate from the learning rate and show its convergence with full gradients. Finally, we present anchored simGD, a new method, and show convergence with stochastic subgradients. △ Less

Submitted 11 October, 2020; v1 submitted 26 May, 2019; originally announced May 2019.

arXiv:1905.05406 [pdf, other]

Plug-and-Play Methods Provably Converge with Properly Trained Denoisers

Authors: Ernest K. Ryu, Jialin Liu, Sicheng Wang, Xiaohan Chen, Zhangyang Wang, Wotao Yin

Abstract: Plug-and-play (PnP) is a non-convex framework that integrates modern denoising priors, such as BM3D or deep learning-based denoisers, into ADMM or other proximal algorithms. An advantage of PnP is that one can use pre-trained denoisers when there is not sufficient data for end-to-end training. Although PnP has been recently studied extensively with great empirical success, theoretical analysis add… ▽ More Plug-and-play (PnP) is a non-convex framework that integrates modern denoising priors, such as BM3D or deep learning-based denoisers, into ADMM or other proximal algorithms. An advantage of PnP is that one can use pre-trained denoisers when there is not sufficient data for end-to-end training. Although PnP has been recently studied extensively with great empirical success, theoretical analysis addressing even the most basic question of convergence has been insufficient. In this paper, we theoretically establish convergence of PnP-FBS and PnP-ADMM, without using diminishing stepsizes, under a certain Lipschitz condition on the denoisers. We then propose real spectral normalization, a technique for training deep learning-based denoisers to satisfy the proposed Lipschitz condition. Finally, we present experimental results validating the theory. △ Less

Submitted 14 May, 2019; originally announced May 2019.

Comments: Published in the International Conference on Machine Learning, 2019

arXiv:1902.09788 [pdf, other]

doi 10.1007/s10107-021-01639-w

Scaled Relative Graph: Nonexpansive operators via 2D Euclidean Geometry

Authors: Ernest K. Ryu, Robert Hannah, Wotao Yin

Abstract: Many iterative methods in applied mathematics can be thought of as fixed-point iterations, and such algorithms are usually analyzed analytically, with inequalities. In this paper, we present a geometric approach to analyzing contractive and nonexpansive fixed point iterations with a new tool called the scaled relative graph (SRG). The SRG provides a correspondence between nonlinear operators and s… ▽ More Many iterative methods in applied mathematics can be thought of as fixed-point iterations, and such algorithms are usually analyzed analytically, with inequalities. In this paper, we present a geometric approach to analyzing contractive and nonexpansive fixed point iterations with a new tool called the scaled relative graph (SRG). The SRG provides a correspondence between nonlinear operators and subsets of the 2D plane. Under this framework, a geometric argument in the 2D plane becomes a rigorous proof of convergence. △ Less

Submitted 16 June, 2021; v1 submitted 26 February, 2019; originally announced February 2019.

Comments: Published in Mathematical Programming

MSC Class: 47H05; 47H09; 51M04; 90C25

arXiv:1812.00146 [pdf, other]

Operator Splitting Performance Estimation: Tight contraction factors and optimal parameter selection

Authors: Ernest K. Ryu, Adrien B. Taylor, Carolina Bergeling, Pontus Giselsson

Abstract: We propose a methodology for studying the performance of common splitting methods through semidefinite programming. We prove tightness of the methodology and demonstrate its value by presenting two applications of it. First, we use the methodology as a tool for computer-assisted proofs to prove tight analytical contraction factors for Douglas--Rachford splitting that are likely too complicated for… ▽ More We propose a methodology for studying the performance of common splitting methods through semidefinite programming. We prove tightness of the methodology and demonstrate its value by presenting two applications of it. First, we use the methodology as a tool for computer-assisted proofs to prove tight analytical contraction factors for Douglas--Rachford splitting that are likely too complicated for a human to find bare-handed. Second, we use the methodology as an algorithmic tool to computationally select the optimal splitting method parameters by solving a series of semidefinite programs. △ Less

Submitted 30 April, 2020; v1 submitted 1 December, 2018; originally announced December 2018.

Comments: Published in the SIAM Journal on Optimization

MSC Class: 47H05 47H09 68Q25 90C22 90C25 90C30 90C60

arXiv:1810.13100 [pdf, other]

Splitting with Near-Circulant Linear Systems: Applications to Total Variation CT and PET

Authors: Ernest K. Ryu, Seyoon Ko, Joong-Ho Won

Abstract: Many imaging problems, such as total variation reconstruction of X-ray computed tomography (CT) and positron-emission tomography (PET), are solved via a convex optimization problem with near-circulant, but not actually circulant, linear systems. The popular methods to solve these problems, alternating direction method of multipliers (ADMM) and primal-dual hybrid gradient (PDHG), do not directly ut… ▽ More Many imaging problems, such as total variation reconstruction of X-ray computed tomography (CT) and positron-emission tomography (PET), are solved via a convex optimization problem with near-circulant, but not actually circulant, linear systems. The popular methods to solve these problems, alternating direction method of multipliers (ADMM) and primal-dual hybrid gradient (PDHG), do not directly utilize this structure. Consequently, ADMM requires a costly matrix inversion as a subroutine, and PDHG takes too many iterations to converge. In this paper, we present near-circulant splitting (NCS), a novel splitting method that leverages the near-circulant structure. We show that NCS can converge with an iteration count close to that of ADMM, while paying a computational cost per iteration close to that of PDHG. Through experiments on a CUDA GPU, we empirically validate the theory and demonstrate that NCS can effectively utilize the parallel computing capabilities of CUDA. △ Less

Submitted 29 November, 2019; v1 submitted 31 October, 2018; originally announced October 2018.

Comments: Published in SIAM Journal on Scientific Computing

arXiv:1810.11167 [pdf, other]

doi 10.1007/s11590-019-01520-y

Linear Convergence of Cyclic SAGA

Authors: Youngsuk Park, Ernest K. Ryu

Abstract: In this work, we present and analyze C-SAGA, a (deterministic) cyclic variant of SAGA. C-SAGA is an incremental gradient method that minimizes a sum of differentiable convex functions by cyclically accessing their gradients. Even though the theory of stochastic algorithms is more mature than that of cyclic counterparts in general, practitioners often prefer cyclic algorithms. We prove C-SAGA conve… ▽ More In this work, we present and analyze C-SAGA, a (deterministic) cyclic variant of SAGA. C-SAGA is an incremental gradient method that minimizes a sum of differentiable convex functions by cyclically accessing their gradients. Even though the theory of stochastic algorithms is more mature than that of cyclic counterparts in general, practitioners often prefer cyclic algorithms. We prove C-SAGA converges linearly under the standard assumptions. Then, we compare the rate of convergence with the full gradient method, (stochastic) SAGA, and incremental aggregated gradient (IAG), theoretically and experimentally. △ Less

Submitted 8 January, 2020; v1 submitted 25 October, 2018; originally announced October 2018.

Comments: Published in Optimization Letters

arXiv:1802.07534 [pdf, other]

Uniqueness of DRS as the 2 Operator Resolvent-Splitting and Impossibility of 3 Operator Resolvent-Splitting

Authors: Ernest K. Ryu

Abstract: Given the success of Douglas--Rachford splitting (DRS), it is natural to ask whether DRS can be generalized. Are there other 2 operator resolvent-splittings sharing the favorable properties of DRS? Can DRS be generalized to 3 operators? This work presents the answers: no and no. In a certain sense, DRS is the unique 2 operator resolvent-splitting, and generalizing DRS to 3 operators is impossible… ▽ More Given the success of Douglas--Rachford splitting (DRS), it is natural to ask whether DRS can be generalized. Are there other 2 operator resolvent-splittings sharing the favorable properties of DRS? Can DRS be generalized to 3 operators? This work presents the answers: no and no. In a certain sense, DRS is the unique 2 operator resolvent-splitting, and generalizing DRS to 3 operators is impossible without lifting, where lifting roughly corresponds to enlarging the problem size. The impossibility result further raises a question. How much lifting is necessary to generalize DRS to 3 operators? This work presents the answer by providing a novel 3 operator resolvent-splitting with provably minimal lifting that directly generalizes DRS. △ Less

Submitted 20 May, 2019; v1 submitted 21 February, 2018; originally announced February 2018.

Comments: Published in Mathematical Programming

arXiv:1801.06618 [pdf, ps, other]

doi 10.1007/s10589-019-00130-9

Douglas--Rachford Splitting and ADMM for Pathological Convex Optimization

Authors: Ernest K. Ryu, Yanli Liu, Wotao Yin

Abstract: Despite the vast literature on DRS and ADMM, there has been very little work analyzing their behavior under pathologies. Most analyses assume a primal solution exists, a dual solution exists, and strong duality holds. When these assumptions are not met, i.e., under pathologies, the theory often breaks down and the empirical performance may degrade significantly. In this paper, we establish that DR… ▽ More Despite the vast literature on DRS and ADMM, there has been very little work analyzing their behavior under pathologies. Most analyses assume a primal solution exists, a dual solution exists, and strong duality holds. When these assumptions are not met, i.e., under pathologies, the theory often breaks down and the empirical performance may degrade significantly. In this paper, we establish that DRS only requires strong duality to work, in the sense that asymptotically iterates are approximately feasible and approximately optimal. △ Less

Submitted 9 September, 2019; v1 submitted 19 January, 2018; originally announced January 2018.

Comments: Published in Computational Optimization and Applications

MSC Class: 90C46; 49N15; 90C25

arXiv:1712.10279 [pdf, other]

Vector and Matrix Optimal Mass Transport: Theory, Algorithm, and Applications

Authors: Ernest K. Ryu, Yongxin Chen, Wuchen Li, Stanley Osher

Abstract: In many applications such as color image processing, data has more than one piece of information associated with each spatial coordinate, and in such cases the classical optimal mass transport (OMT) must be generalized to handle vector-valued or matrix-valued densities. In this paper, we discuss the vector and matrix optimal mass transport and present three contributions. We first present a rigoro… ▽ More In many applications such as color image processing, data has more than one piece of information associated with each spatial coordinate, and in such cases the classical optimal mass transport (OMT) must be generalized to handle vector-valued or matrix-valued densities. In this paper, we discuss the vector and matrix optimal mass transport and present three contributions. We first present a rigorous mathematical formulation for these setups and provide analytical results including existence of solutions and strong duality. Next, we present a simple, scalable, and parallelizable methods to solve the vector and matrix-OMT problems. Finally, we implement the proposed methods on a CUDA GPU and present experiments and applications. △ Less

Submitted 16 June, 2018; v1 submitted 29 December, 2017; originally announced December 2017.

Comments: 22 pages, 5 figures, 3 tables

MSC Class: 65K10; 65K05; 90C25

arXiv:1709.02838 [pdf, other]

Cosmic Divergence, Weak Cosmic Convergence, and Fixed Points at Infinity

Authors: Ernest K. Ryu

Abstract: To characterize the asymptotic behavior of fixed-point iterations of non-expansive operators with no fixed points, Bauschke et al. [Fixed Point Theory Appl. (2016)] recently studied cosmic convergence and conjectured that cosmic convergence always holds. This paper presents a cosmically divergent counter example, which disproves this conjecture. This paper also demonstrates, with a counter example… ▽ More To characterize the asymptotic behavior of fixed-point iterations of non-expansive operators with no fixed points, Bauschke et al. [Fixed Point Theory Appl. (2016)] recently studied cosmic convergence and conjectured that cosmic convergence always holds. This paper presents a cosmically divergent counter example, which disproves this conjecture. This paper also demonstrates, with a counter example, that cosmic convergence can be weak in infinite dimensions. Finally, this paper shows positive results relating to cosmic convergence that provide an interpretation of cosmic accumulation points as fixed points at infinity. △ Less

Submitted 4 June, 2018; v1 submitted 8 September, 2017; originally announced September 2017.

MSC Class: 47H09 (Primary); 90C25 (Secondary)

Showing 1–50 of 57 results for author: Ryu, E