-
Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity
Authors:
Qian Yu,
Yining Wang,
Baihe Huang,
Qi Lei,
Jason D. Lee
Abstract:
Optimization of convex functions under stochastic zeroth-order feedback has been a major and challenging question in online learning. In this work, we consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries. We provide the first tight characterization for the rate of the mi…
▽ More
Optimization of convex functions under stochastic zeroth-order feedback has been a major and challenging question in online learning. In this work, we consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries. We provide the first tight characterization for the rate of the minimax simple regret by develo** matching upper and lower bounds. We propose an algorithm that features a combination of a bootstrap** stage and a mirror-descent stage. Our main technical innovation consists of a sharp characterization for the spherical-sampling gradient estimator under higher-order smoothness conditions, which allows the algorithm to optimally balance the bias-variance tradeoff, and a new iterative method for the bootstrap** stage, which maintains the performance for unbounded Hessian.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Modeling and simulations of high-density two-phase flows using projection-based Cahn-Hilliard Navier-Stokes equations
Authors:
Ali Rabeh,
Makrand A. Khanwale,
Jonghyun Lee,
Baskar Ganapathysubramanian
Abstract:
Accurately modeling the dynamics of high-density ratio ($\mathcal{O}(10^5)$) two-phase flows is important for many applications in material science and manufacturing. In this work, we consider numerical simulations of molten metal undergoing microgravity oscillations. Accurate simulation of the oscillation dynamics allows us to characterize the interplay between the two fluids' surface tension and…
▽ More
Accurately modeling the dynamics of high-density ratio ($\mathcal{O}(10^5)$) two-phase flows is important for many applications in material science and manufacturing. In this work, we consider numerical simulations of molten metal undergoing microgravity oscillations. Accurate simulation of the oscillation dynamics allows us to characterize the interplay between the two fluids' surface tension and density ratio, which is an important consideration for terrestrial manufacturing applications. We present a projection-based computational framework for solving a thermodynamically-consistent Cahn-Hilliard Navier-Stokes equations for two-phase flows under these large density ratios. A modified version of the pressure-decoupled solver based on the Helmholtz-Hodge decomposition presented in Khanwale et al. [$\textit{A projection-based, semi-implicit time-step** approach for the Cahn-Hilliard Navier-Stokes equations on adaptive octree meshes.}$, Journal of Computational Physics 475 (2023): 111874] is used. We present a comprehensive convergence study to investigate the effect of mesh resolution, time-step, and interfacial thickness on droplet-shape oscillations. We deploy our framework to predict the oscillation behavior of three physical systems exhibiting very large density ratios ($10^4-10^5:1$) that have previously never been performed.
△ Less
Submitted 1 July, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
Minimal grid diagrams of the prime alternating knots with 13 crossings
Authors:
Hwa Jeong Lee,
Alexander Stoimenow,
Gyo Taek **
Abstract:
A knot is a closed loop in space without self-intersection. Two knots are equivalent if there is a self homeomorphism of space bringing one onto the other. An arc presentation is an embedding of a knot in the union of finitely many half planes with a common boundary line such that each half plane contains a simple arc of the knot. The minimal number of such half planes among all arc presentations…
▽ More
A knot is a closed loop in space without self-intersection. Two knots are equivalent if there is a self homeomorphism of space bringing one onto the other. An arc presentation is an embedding of a knot in the union of finitely many half planes with a common boundary line such that each half plane contains a simple arc of the knot. The minimal number of such half planes among all arc presentations of a given knot is called the arc index of the knot. A knot is usually presented as a planar diagram with finitely many crossings of two strands where one of the strands goes over the other. A grid diagram is a planar diagram which is a non-simple rectilinear polygon such that vertical edges always cross over horizontal edges at all crossings. It is easily seen that an arc presentation gives rise to a grid diagram and vice versa. It is known that the arc index of an alternating knot is two plus its minimal crossing number. There are 4878 prime alternating knots with minimal crossing number 13. We obtained minimal arc presentations of them in the form of grid diagrams having 15 vertical segments. This is a continuation of the works on prime alternating knots of 11 crossings and 12 crossings.
△ Less
Submitted 31 March, 2024;
originally announced June 2024.
-
The Powell Conjecture in genus four
Authors:
Sangbum Cho,
Yuya Koda,
Jung Hoon Lee
Abstract:
The Powell Conjecture states that four specific elements suffice to generate the Goeritz group of the Heegaard splitting of the $3$-sphere. We show that this conjecture is true when the genus of the splitting is four.
The Powell Conjecture states that four specific elements suffice to generate the Goeritz group of the Heegaard splitting of the $3$-sphere. We show that this conjecture is true when the genus of the splitting is four.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Obstructing two-torsion in the rational knot concordance group
Authors:
Jaewon Lee
Abstract:
It is well known that there are many 2-torsion elements in the classical knot concordance group. On the other hand, it is not known if there is any torsion element in the rational knot concordance group $\mathcal{C}_\mathbb{Q}$. Cha defined the algebraic rational concordance group $\mathcal{AC}_\mathbb{Q}$, an analogue of the classical algebraic concordance group, and showed that…
▽ More
It is well known that there are many 2-torsion elements in the classical knot concordance group. On the other hand, it is not known if there is any torsion element in the rational knot concordance group $\mathcal{C}_\mathbb{Q}$. Cha defined the algebraic rational concordance group $\mathcal{AC}_\mathbb{Q}$, an analogue of the classical algebraic concordance group, and showed that $\mathcal{AC}_\mathbb{Q}\cong\mathbb{Z}^\infty\oplus\mathbb{Z}_2^\infty\oplus\mathbb{Z}_4^\infty$. The knots that represent 2-torsions in $\mathcal{AC}_\mathbb{Q}$ potentially have order $2$ in $\mathcal{C}_\mathbb{Q}$. In this paper, we provide an obstruction for knots of order $2$ in $\mathcal{AC}_\mathbb{Q}$ from being of finite order in $\mathcal{C}_\mathbb{Q}$. Moreover, we give a family consisting of such knots that generates an infinite rank subgroup of $\mathcal{C}_\mathbb{Q}$. We also note that Cha proved that in higher dimensions, the algebraic rational concordance order is the same as the rational knot concordance order. Our obstruction is based on the localized von Neumann $ρ$-invariant.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Non-freeness of parabolic two-generator groups
Authors:
Philip Choi,
Kyeonghee Jo,
Hyuk Kim,
Junho Lee
Abstract:
A complex number $λ$ is said to be non-free if the subgroup of $SL(2,\bc)$ generated by $$X=\begin{pmatrix} 1& 1\\ 0 & 1
\end{pmatrix} \,\, \text{and}\,\,\,Y_λ=\begin{pmatrix} 1& 0\\ λ& 1
\end{pmatrix}$$ is not a free group of rank 2. In this case the number $λ$ is called a relation number, and it has been a long standing problem to determine the relation numbers. In this paper, we characteriz…
▽ More
A complex number $λ$ is said to be non-free if the subgroup of $SL(2,\bc)$ generated by $$X=\begin{pmatrix} 1& 1\\ 0 & 1
\end{pmatrix} \,\, \text{and}\,\,\,Y_λ=\begin{pmatrix} 1& 0\\ λ& 1
\end{pmatrix}$$ is not a free group of rank 2. In this case the number $λ$ is called a relation number, and it has been a long standing problem to determine the relation numbers. In this paper, we characterize the relation numbers by establishing the equivalence between $λ$ being a relation number and $u:=\sqrt{- λ}$ being a root of a `generalized Chebyshev polynomial'. The generalized Chebyshev polynomials of degree $k$ are given by a sequence of $k$ integers $(n_1, n_2,\cdots, n_k)$ using the usual recursive formula, and thereby can be studied systematically using continuants and continued fractions. Such formulation, then, enables us to prove that, the question whether a given number $λ$ is a relation number of $u$-degree $k$ can be answered by checking only finitely many generalized Chebyshev polynomials. Based on these theorems, we design an algorithm deciding any given number is a relation number with minimal degree $k$. With its computer implementation we provide a few sample examples, with a particular emphasis on the well known conjecture that every rational number in the interval $(-4, 4)$ is a relation number.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Hamilton-Jacobi Based Policy-Iteration via Deep Operator Learning
Authors:
Jae Yong Lee,
Yeoneung Kim
Abstract:
The framework of deep operator network (DeepONet) has been widely exploited thanks to its capability of solving high dimensional partial differential equations. In this paper, we incorporate DeepONet with a recently developed policy iteration scheme to numerically solve optimal control problems and the corresponding Hamilton--Jacobi--Bellman (HJB) equations. A notable feature of our approach is th…
▽ More
The framework of deep operator network (DeepONet) has been widely exploited thanks to its capability of solving high dimensional partial differential equations. In this paper, we incorporate DeepONet with a recently developed policy iteration scheme to numerically solve optimal control problems and the corresponding Hamilton--Jacobi--Bellman (HJB) equations. A notable feature of our approach is that once the neural network is trained, the solution to the optimal control problem and HJB equations with different terminal functions can be inferred quickly thanks to the unique feature of operator learning. Furthermore, a quantitative analysis of the accuracy of the algorithm is carried out via comparison principles of viscosity solutions. The effectiveness of the method is verified with various examples, including 10-dimensional linear quadratic regulator problems (LQRs).
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
A Characterization of backward bounded solutions
Authors:
Minkyu Kwak,
Jihoon Lee,
Bataa Lkhagvasuren
Abstract:
We prove that the collection $\mathcal M_{-\infty}$ of backward bounded solutions for a semilinear evolution equation is the graph of an upper hemicontinuous set-valued function from the low Fourier modes to the higher Fourier modes, which is invariant and contains the global attractor. We also show that there exists a limit $\mathcal M_{\infty}$ of finite dimensional Lipschitz manifolds…
▽ More
We prove that the collection $\mathcal M_{-\infty}$ of backward bounded solutions for a semilinear evolution equation is the graph of an upper hemicontinuous set-valued function from the low Fourier modes to the higher Fourier modes, which is invariant and contains the global attractor. We also show that there exists a limit $\mathcal M_{\infty}$ of finite dimensional Lipschitz manifolds $\mathcal M_t$ generated by the time $t$-maps ($t>0$) from the flat manifold $\mathcal M_0$ with the Hausdorff distance and we find $\mathcal M_{\infty} \subset \mathcal M_{-\infty}$. No spectral gap conditions are assumed.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Scaling Laws in Linear Regression: Compute, Parameters, and Data
Authors:
Licong Lin,
**gfeng Wu,
Sham M. Kakade,
Peter L. Bartlett,
Jason D. Lee
Abstract:
Empirically, large-scale deep learning models often satisfy a neural scaling law: the test error of the trained model improves polynomially as the model size and data size grow. However, conventional wisdom suggests the test error consists of approximation, bias, and variance errors, where the variance error increases with model size. This disagrees with the general form of neural scaling laws, wh…
▽ More
Empirically, large-scale deep learning models often satisfy a neural scaling law: the test error of the trained model improves polynomially as the model size and data size grow. However, conventional wisdom suggests the test error consists of approximation, bias, and variance errors, where the variance error increases with model size. This disagrees with the general form of neural scaling laws, which predict that increasing model size monotonically improves performance.
We study the theory of scaling laws in an infinite dimensional linear regression setup. Specifically, we consider a model with $M$ parameters as a linear function of sketched covariates. The model is trained by one-pass stochastic gradient descent (SGD) using $N$ data. Assuming the optimal parameter satisfies a Gaussian prior and the data covariance matrix has a power-law spectrum of degree $a>1$, we show that the reducible part of the test error is $Θ(M^{-(a-1)} + N^{-(a-1)/a})$. The variance error, which increases with $M$, is dominated by the other errors due to the implicit regularization of SGD, thus disappearing from the bound. Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Design and Scheduling of an AI-based Queueing System
Authors:
Jiung Lee,
Hongseok Namkoong,
Yibo Zeng
Abstract:
To leverage prediction models to make optimal scheduling decisions in service systems, we must understand how predictive errors impact congestion due to externalities on the delay of other jobs. Motivated by applications where prediction models interact with human servers (e.g., content moderation), we consider a large queueing system comprising of many single server queues where the class of a jo…
▽ More
To leverage prediction models to make optimal scheduling decisions in service systems, we must understand how predictive errors impact congestion due to externalities on the delay of other jobs. Motivated by applications where prediction models interact with human servers (e.g., content moderation), we consider a large queueing system comprising of many single server queues where the class of a job is estimated using a prediction model. By characterizing the impact of mispredictions on congestion cost in heavy traffic, we design an index-based policy that incorporates the predicted class information in a near-optimal manner. Our theoretical results guide the design of predictive models by providing a simple model selection procedure with downstream queueing performance as a central concern, and offer novel insights on how to design queueing systems with AI-based triage. We illustrate our framework on a content moderation task based on real online comments, where we construct toxicity classifiers by finetuning large language models.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Summing divergent matrix series
Authors:
Rongbiao Wang,
JungHo Lee,
Lek-Heng Lim
Abstract:
We extend several celebrated methods in classical analysis for summing series of complex numbers to series of complex matrices. These include the summation methods of Abel, Borel, Cesáro, Euler, Lambert, Nörlund, and Mittag-Leffler, which are frequently used to sum scalar series that are divergent in the conventional sense. One feature of our matrix extensions is that they are fully noncommutative…
▽ More
We extend several celebrated methods in classical analysis for summing series of complex numbers to series of complex matrices. These include the summation methods of Abel, Borel, Cesáro, Euler, Lambert, Nörlund, and Mittag-Leffler, which are frequently used to sum scalar series that are divergent in the conventional sense. One feature of our matrix extensions is that they are fully noncommutative generalizations of their scalar counterparts -- not only is the scalar series replaced by a matrix series, positive weights are replaced by positive definite matrix weights, order on $\mathbb{R}$ replaced by Loewner order, exponential function replaced by matrix exponential function, etc. We will establish the regularity of our matrix summation methods, i.e., when applied to a matrix series convergent in the conventional sense, we obtain the same value for the sum. Our second goal is to provide numerical algorithms that work in conjunction with these summation methods. We discuss how the block and mixed-block summation algorithms, the Kahan compensated summation algorithm, may be applied to matrix sums with similar roundoff error bounds. These summation methods and algorithms apply not only to power or Taylor series of matrices but to any general matrix series including matrix Fourier and Dirichlet series. We will demonstrate the utility of these summation methods: establishing a Fejér's theorem and alleviating the Gibbs phenomenon for matrix Fourier series; extending the domains of matrix functions and accurately evaluating them; enhancing the matrix Padé approximation and Schur--Parlett algorithms; and more.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
An Unconstrained Formulation of Some Constrained Partial Differential Equations and its Application to Finite Neuron Methods
Authors:
Jiwei Jia,
Young Ju Lee,
Ruitong Shan
Abstract:
In this paper, we present a new framework how a PDE with constraints can be formulated into a sequence of PDEs with no constraints, whose solutions are convergent to the solution of the PDE with constraints. This framework is then used to build a novel finite neuron method to solve the 2nd order elliptic equations with the Dirichlet boundary condition. Our algorithm is the first algorithm, proven…
▽ More
In this paper, we present a new framework how a PDE with constraints can be formulated into a sequence of PDEs with no constraints, whose solutions are convergent to the solution of the PDE with constraints. This framework is then used to build a novel finite neuron method to solve the 2nd order elliptic equations with the Dirichlet boundary condition. Our algorithm is the first algorithm, proven to lead to shallow neural network solutions with an optimal H1 norm error. We show that a widely used penalized PDE, which imposes the Dirichlet boundary condition weakly can be interpreted as the first element of the sequence of PDEs within our framework. Furthermore, numerically, we show that it may not lead to the solution with the optimal H1 norm error bound in general. On the other hand, we theoretically demonstrate that the second and later elements of a sequence of PDEs can lead to an adequate solution with the optimal H1 norm error bound. A number of sample tests are performed to confirm the effectiveness of the proposed algorithm and the relevant theory.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Maximal operators given by Fourier multipliers with dilation of fractional dimensions
Authors:
** Bong Lee,
**sol Seo
Abstract:
In this paper, we investigate $L^p$ bounds of maximal Fourier multiplier operators with dilation of fractional dimensions. For the Fourier multipliers, we suggest a criterion related to dimensions of dilation sets which guarantees $L^p$ bounds of the maximal operators for each $p$. Our criterion covers Mikhlin-type multipliers, multipliers with limited decay, and multipliers with slow decay.
In this paper, we investigate $L^p$ bounds of maximal Fourier multiplier operators with dilation of fractional dimensions. For the Fourier multipliers, we suggest a criterion related to dimensions of dilation sets which guarantees $L^p$ bounds of the maximal operators for each $p$. Our criterion covers Mikhlin-type multipliers, multipliers with limited decay, and multipliers with slow decay.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Kernel-based optimally weighted conformal prediction intervals
Authors:
Jonghyeok Lee,
Chen Xu,
Yao Xie
Abstract:
Conformal prediction has been a popular distribution-free framework for uncertainty quantification. In this paper, we present a novel conformal prediction method for time-series, which we call Kernel-based Optimally Weighted Conformal Prediction Intervals (KOWCPI). Specifically, KOWCPI adapts the classic Reweighted Nadaraya-Watson (RNW) estimator for quantile regression on dependent data and learn…
▽ More
Conformal prediction has been a popular distribution-free framework for uncertainty quantification. In this paper, we present a novel conformal prediction method for time-series, which we call Kernel-based Optimally Weighted Conformal Prediction Intervals (KOWCPI). Specifically, KOWCPI adapts the classic Reweighted Nadaraya-Watson (RNW) estimator for quantile regression on dependent data and learns optimal data-adaptive weights. Theoretically, we tackle the challenge of establishing a conditional coverage guarantee for non-exchangeable data under strong mixing conditions on the non-conformity scores. We demonstrate the superior performance of KOWCPI on real time-series against state-of-the-art methods, where KOWCPI achieves narrower confidence intervals without losing coverage.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Carleson measures for weighted Bergman--Zygmund spaces
Authors:
Hong Rae Cho,
Hyungwoon Koo,
Young Joo Lee,
Atte Pennanen,
Jouni Rättyä,
Fanglei Wu
Abstract:
For $0<p<\infty$, $Ψ:[0,\infty)\to(0,\infty)$ and a finite positive Borel measure $μ$ on the unit disc $\mathbb{D}$, the Lebesgue--Zygmund space $L^p_{μ,Ψ}$ consists of all measurable functions $f$ such that $\lVert f \rVert_{L_{μ, Ψ}^{p}}^p =\int_{\mathbb{D}}|f|^pΨ(|f|)\,dμ< \infty$. For an integrable radial function $ω$ on $\mathbb{D}$, the corresponding weighted Bergman-Zygmund space…
▽ More
For $0<p<\infty$, $Ψ:[0,\infty)\to(0,\infty)$ and a finite positive Borel measure $μ$ on the unit disc $\mathbb{D}$, the Lebesgue--Zygmund space $L^p_{μ,Ψ}$ consists of all measurable functions $f$ such that $\lVert f \rVert_{L_{μ, Ψ}^{p}}^p =\int_{\mathbb{D}}|f|^pΨ(|f|)\,dμ< \infty$. For an integrable radial function $ω$ on $\mathbb{D}$, the corresponding weighted Bergman-Zygmund space $A_{ω, Ψ}^{p}$ is the set of all analytic functions in $L_{μ, Ψ}^{p}$ with $dμ=ω\,dA$.
The purpose of the paper is to characterize bounded (and compact) embeddings $A_{ω,Ψ}^{p}\subset L_{μ, Φ}^{q}$, when $0<p\le q<\infty$, the functions $Ψ$ and $Φ$ are essential monotonic, and $Ψ,Φ,ω$ satisfy certain doubling properties. The tools developed on the way to the main results are applied to characterize bounded and compact integral operators acting from $A^p_{ω,Ψ}$ to $A^q_{ν,Φ}$, provided $ν$ admits the same doubling property as $ω$.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Witten deformation and divergence-free symmetric Killing 2-tensors
Authors:
Kwangho Choi,
Junho Lee
Abstract:
Using a Morse function and a Witten deformation argument, we obtain an upper bound for the dimension of the space of divergence-free symmetric Killing $p$-tensors on a closed Riemannian manifold, and calculate it explicitly for $p=2$.
Using a Morse function and a Witten deformation argument, we obtain an upper bound for the dimension of the space of divergence-free symmetric Killing $p$-tensors on a closed Riemannian manifold, and calculate it explicitly for $p=2$.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
The standard generators of the tetrahedron algebra and their look-alikes
Authors:
Jae-Ho Lee
Abstract:
The tetrahedron algebra $\boxtimes$ is an infinite-dimensional Lie algebra defined by generators $\{x_{ij} \mid i, j \in \{0, 1, 2, 3\}, i \neq j\}$ and some relations, including the Dolan-Grady relations. These twelve generators are called standard. We introduce a type of element in $\boxtimes$ that "looks like" a standard generator. For mutually distinct $h, i, j, k \in \{0, 1, 2, 3\}$, consider…
▽ More
The tetrahedron algebra $\boxtimes$ is an infinite-dimensional Lie algebra defined by generators $\{x_{ij} \mid i, j \in \{0, 1, 2, 3\}, i \neq j\}$ and some relations, including the Dolan-Grady relations. These twelve generators are called standard. We introduce a type of element in $\boxtimes$ that "looks like" a standard generator. For mutually distinct $h, i, j, k \in \{0, 1, 2, 3\}$, consider the standard generator $x_{ij}$ of $\boxtimes$. An element $ξ\in \boxtimes$ is called $x_{ij}$-like whenever both (i) $ξ$ commutes with $x_{ij}$; (ii) $ξ$ and $x_{hk}$ satisfy a Dolan-Grady relation. Pick mutually distinct $i,j,k \in \{0,1,2,3\}$. In our main result, we find an attractive basis for $\boxtimes$ with the property that every basis element is either $x_{ij}$-like or $x_{jk}$-like or $x_{ki}$-like. We discuss this basis from multiple points of view.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Incorporating changeable attitudes toward vaccination into an SIR infectious disease model
Authors:
Yi Jiang,
Kristin M. Kurianski,
Jane H. Lee,
Yan** Ma,
Daniel Cicala,
Glenn Ledder
Abstract:
We develop a mechanistic model that classifies individuals both in terms of epidemiological status (SIR) and vaccination attitude (willing or unwilling), with the goal of discovering how disease spread is influenced by changing opinions about vaccination. Analysis of the model identifies existence and stability criteria for both disease-free and endemic disease equilibria. The analytical results,…
▽ More
We develop a mechanistic model that classifies individuals both in terms of epidemiological status (SIR) and vaccination attitude (willing or unwilling), with the goal of discovering how disease spread is influenced by changing opinions about vaccination. Analysis of the model identifies existence and stability criteria for both disease-free and endemic disease equilibria. The analytical results, supported by numerical simulations, show that attitude changes induced by disease prevalence can destabilize endemic disease equilibria, resulting in limit cycles.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Impact of EIP-4844 on Ethereum: Consensus Security, Ethereum Usage, Rollup Transaction Dynamics, and Blob Gas Fee Markets
Authors:
Seongwan Park,
Bosul Mun,
Seungyun Lee,
Woo** Jeong,
Jaewook Lee,
Hyeonsang Eom,
Huisu Jang
Abstract:
On March 13, 2024, Ethereum implemented EIP-4844, designed to enhance its role as a data availability layer. While this upgrade reduces data posting costs for rollups, it also raises concerns about its impact on the consensus layer due to increased propagation sizes. Moreover, the broader effects on the overall Ethereum ecosystem remain largely unexplored. In this paper, we conduct an empirical an…
▽ More
On March 13, 2024, Ethereum implemented EIP-4844, designed to enhance its role as a data availability layer. While this upgrade reduces data posting costs for rollups, it also raises concerns about its impact on the consensus layer due to increased propagation sizes. Moreover, the broader effects on the overall Ethereum ecosystem remain largely unexplored. In this paper, we conduct an empirical analysis of the impact of EIP-4844 on consensus security, Ethereum usage, rollup transaction dynamics, and the blob gas fee mechanism. We explore changes in synchronization times, provide quantitative assessments of rollup and user behaviors, and deepen the understanding of the blob gas fee mechanism, highlighting both enhancements and areas of concern post-upgrade.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Uniqueness of $p$-local truncated Brown-Peterson spectra
Authors:
David Jongwon Lee
Abstract:
When $p$ is an odd prime, we prove that the $\mathbb F_p$-cohomology of $\mathrm{BP}\langle n\rangle$ as a module over the Steenrod algebra determines the $p$-local spectrum $\mathrm{BP}\langle n\rangle$. In particular, we prove that the $p$-local spectrum $\mathrm{BP}\langle n\rangle$ only depends on its $p$-completion $\mathrm{BP}\langle n\rangle_p^\wedge$. As a corollary, this proves that the…
▽ More
When $p$ is an odd prime, we prove that the $\mathbb F_p$-cohomology of $\mathrm{BP}\langle n\rangle$ as a module over the Steenrod algebra determines the $p$-local spectrum $\mathrm{BP}\langle n\rangle$. In particular, we prove that the $p$-local spectrum $\mathrm{BP}\langle n\rangle$ only depends on its $p$-completion $\mathrm{BP}\langle n\rangle_p^\wedge$. As a corollary, this proves that the $p$-local homotopy type of $\mathrm{BP}\langle n\rangle$ does not depend on the ideal by which we take the quotient of $\mathrm{BP}$. In the course of the argument, we show that there is a vanishing line for odd degree classes in the Adams spectral sequence for endomorphisms of $\mathrm{BP}\langle n\rangle$. We also prove that there are enough endomorphisms of $\mathrm{BP}\langle n\rangle$ in a suitable sense. When $p=2$, we obtain the results for $n\leq 3$.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Error analysis for finite element operator learning methods for solving parametric second-order elliptic PDEs
Authors:
Youngjoon Hong,
Seungchan Ko,
Jaeyong Lee
Abstract:
In this paper, we provide a theoretical analysis of a type of operator learning method without data reliance based on the classical finite element approximation, which is called the finite element operator network (FEONet). We first establish the convergence of this method for general second-order linear elliptic PDEs with respect to the parameters for neural network approximation. In this regard,…
▽ More
In this paper, we provide a theoretical analysis of a type of operator learning method without data reliance based on the classical finite element approximation, which is called the finite element operator network (FEONet). We first establish the convergence of this method for general second-order linear elliptic PDEs with respect to the parameters for neural network approximation. In this regard, we address the role of the condition number of the finite element matrix in the convergence of the method. Secondly, we derive an explicit error estimate for the self-adjoint case. For this, we investigate some regularity properties of the solution in certain function classes for a neural network approximation, verifying the sufficient condition for the solution to have the desired regularity. Finally, we will also conduct some numerical experiments that support the theoretical findings, confirming the role of the condition number of the finite element matrix in the overall convergence.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
Boosting e-BH via conditional calibration
Authors:
Junu Lee,
Zhimei Ren
Abstract:
The e-BH procedure is an e-value-based multiple testing procedure that provably controls the false discovery rate (FDR) under any dependence structure between the e-values. Despite this appealing theoretical FDR control guarantee, the e-BH procedure often suffers from low power in practice. In this paper, we propose a general framework that boosts the power of e-BH without sacrificing its FDR cont…
▽ More
The e-BH procedure is an e-value-based multiple testing procedure that provably controls the false discovery rate (FDR) under any dependence structure between the e-values. Despite this appealing theoretical FDR control guarantee, the e-BH procedure often suffers from low power in practice. In this paper, we propose a general framework that boosts the power of e-BH without sacrificing its FDR control under arbitrary dependence. This is achieved by the technique of conditional calibration, where we take as input the e-values and calibrate them to be a set of "boosted e-values" that are guaranteed to be no less -- and are often more -- powerful than the original ones. Our general framework is explicitly instantiated in three classes of multiple testing problems: (1) testing under parametric models, (2) conditional independence testing under the model-X setting, and (3) model-free conformalized selection. Extensive numerical experiments show that our proposed method significantly improves the power of e-BH while continuing to control the FDR. We also demonstrate the effectiveness of our method through an application to an observational study dataset for identifying individuals whose counterfactuals satisfy certain properties.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Around the positive graph conjecture
Authors:
David Conlon,
Joonkyung Lee,
Leo Versteegen
Abstract:
A graph $H$ is said to be positive if the homomorphism density $t_H(G)$ is non-negative for all weighted graphs $G$. The positive graph conjecture proposes a characterisation of such graphs, saying that a graph is positive if and only if it is symmetric, in the sense that it is formed by gluing two copies of some subgraph along an independent set. We prove several results relating to this conjectu…
▽ More
A graph $H$ is said to be positive if the homomorphism density $t_H(G)$ is non-negative for all weighted graphs $G$. The positive graph conjecture proposes a characterisation of such graphs, saying that a graph is positive if and only if it is symmetric, in the sense that it is formed by gluing two copies of some subgraph along an independent set. We prove several results relating to this conjecture. First, we make progress towards the conjecture itself by showing that any connected positive graph must have a vertex of even degree. We then make use of this result to identify some new counterexamples to the analogue of Sidorenko's conjecture for hypergraphs. In particular, we show that, for $r$ odd, every $r$-uniform tight cycle is a counterexample, generalising a recent result of Conlon, Lee and Sidorenko that dealt with the case $r=3$. Finally, we relate the positive graph conjecture to the emerging study of graph codes by showing that any positive graph has vanishing graph code density, thereby improving a result of Alon who proved the same result for symmetric graphs. Our proofs make use of a variety of tools and techniques, including the properties of independence polynomials, hypergraph quasirandomness and discrete Fourier analysis.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Linear structures of norm-attaining Lipschitz functions and their complements
Authors:
Geunsu Choi,
Mingu Jung,
Han Ju Lee,
Oscar Roldan
Abstract:
We solve two main questions on linear structures of (non-)norm-attaining Lipschitz functions. First, we show that for every infinite metric space $M$, the set consisting of Lipschitz functions on $M$ which do not strongly attain their norm and the zero contains an isometric copy of $\ell_\infty$, and moreover, those functions can be chosen not to attain their norm as functionals on the Lipschitz-f…
▽ More
We solve two main questions on linear structures of (non-)norm-attaining Lipschitz functions. First, we show that for every infinite metric space $M$, the set consisting of Lipschitz functions on $M$ which do not strongly attain their norm and the zero contains an isometric copy of $\ell_\infty$, and moreover, those functions can be chosen not to attain their norm as functionals on the Lipschitz-free space over $M$. Second, we prove that for every infinite metric space $M$, neither the set of strongly norm-attaining Lipschitz functions on $M$ nor the union of its complement with zero is ever a linear space. Furthermore, we observe that the set consisting of Lipschitz functions which cannot be approximated by strongly norm-attaining ones and the zero element contains $\ell_\infty$ isometrically in all the known cases. Some natural observations and spaceability results are also investigated for Lipschitz functions that attain their norm in one way but do not in another, for several norm-attainment notions considered in the literature.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Gaining or losing perspective for convex multivariate functions on box domains
Authors:
Luze Xu,
Jon Lee
Abstract:
MINLO (mixed-integer nonlinear optimization) formulations of the disjunction between the origin and a polytope via a binary indicator variable is broadly used in nonlinear combinatorial optimization for modeling a fixed cost associated with carrying out a group of activities and a convex cost function associated with the levels of the activities. The perspective relaxation of such models is often…
▽ More
MINLO (mixed-integer nonlinear optimization) formulations of the disjunction between the origin and a polytope via a binary indicator variable is broadly used in nonlinear combinatorial optimization for modeling a fixed cost associated with carrying out a group of activities and a convex cost function associated with the levels of the activities. The perspective relaxation of such models is often used to solve to global optimality in a branch-and-bound context, but it typically requires suitable conic solvers and is not compatible with general-purpose NLP software in the presence of other classes of constraints. This motivates the investigation of when simpler but weaker relaxations may be adequate. Comparing the volume (i.e., Lebesgue measure) of the relaxations as a measure of tightness, we lift some of the results related to the simplex case to the box case. In order to compare the volumes of different relaxations in the box case, it is necessary to find an appropriate concave upper bound that preserves the convexity and is minimal, which is more difficult than in the simplex case. To address the challenge beyond the simplex case, the triangulation approach is used.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Methodology for Interpretable Reinforcement Learning for Optimizing Mechanical Ventilation
Authors:
Joo Seung Lee,
Malini Mahendra,
Anil Aswani
Abstract:
Mechanical ventilation is a critical life-support intervention that uses a machine to deliver controlled air and oxygen to a patient's lungs, assisting or replacing spontaneous breathing. While several data-driven approaches have been proposed to optimize ventilator control strategies, they often lack interpretability and agreement with general domain knowledge. This paper proposes a methodology f…
▽ More
Mechanical ventilation is a critical life-support intervention that uses a machine to deliver controlled air and oxygen to a patient's lungs, assisting or replacing spontaneous breathing. While several data-driven approaches have been proposed to optimize ventilator control strategies, they often lack interpretability and agreement with general domain knowledge. This paper proposes a methodology for interpretable reinforcement learning (RL) using decision trees for mechanical ventilation control. Using a causal, nonparametric model-based off-policy evaluation, we evaluate the policies in their ability to gain increases in SpO2 while avoiding aggressive ventilator settings which are known to cause ventilator induced lung injuries and other complications. Numerical experiments using MIMIC-III data on the stays of real patients' intensive care unit stays demonstrate that the decision tree policy outperforms the behavior cloning policy and is comparable to state-of-the-art RL policy. Future work concerns better aligning the cost function with medical objectives to generate deeper clinical insights.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Convex relaxation for the generalized maximum-entropy sampling problem
Authors:
Gabriel Ponte,
Marcia Fampa,
Jon Lee
Abstract:
The generalized maximum-entropy sampling problem (GMESP) is to select an order-$s$ principal submatrix from an order-$n$ covariance matrix, to maximize the product of its $t$ greatest eigenvalues, $0<t\leq s <n$. It is a problem that specializes to two fundamental problems in statistical design theory:(i) maximum-entropy sampling problem (MESP); (ii) binary D-optimality (D-Opt). In the general cas…
▽ More
The generalized maximum-entropy sampling problem (GMESP) is to select an order-$s$ principal submatrix from an order-$n$ covariance matrix, to maximize the product of its $t$ greatest eigenvalues, $0<t\leq s <n$. It is a problem that specializes to two fundamental problems in statistical design theory:(i) maximum-entropy sampling problem (MESP); (ii) binary D-optimality (D-Opt). In the general case, it is motivated by a selection problem in the context of PCA (principal component analysis).
We introduce the first convex-optimization based relaxation for GMESP, study its behavior, compare it to an earlier spectral bound, and demonstrate its use in a branch-and-bound scheme. We find that such an approach is practical when $s-t$ is very small.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Towards a classification of $1$-homogeneous distance-regular graphs with positive intersection number $a_1$
Authors:
Jack H. Koolen,
Mamoon Abdullah,
Brhane Gebremichel,
Jae-Ho Lee
Abstract:
Let $Γ$ be a graph with diameter at least two. Then $Γ$ is said to be $1$-homogeneous (in the sense of Nomura) whenever for every pair of adjacent vertices $x$ and $y$ in $Γ$, the distance partition of the vertex set of $Γ$ with respect to both $x$ and $y$ is equitable, and the parameters corresponding to equitable partitions are independent of the choice of $x$ and $y$. Assume $Γ$ is $1$-homogene…
▽ More
Let $Γ$ be a graph with diameter at least two. Then $Γ$ is said to be $1$-homogeneous (in the sense of Nomura) whenever for every pair of adjacent vertices $x$ and $y$ in $Γ$, the distance partition of the vertex set of $Γ$ with respect to both $x$ and $y$ is equitable, and the parameters corresponding to equitable partitions are independent of the choice of $x$ and $y$. Assume $Γ$ is $1$-homogeneous distance-regular with intersection number $a_1>0$ and $D\geqslant 5$. Define $b=b_1/(θ_1+1)$, where $b_1$ is the intersection number and $θ_1$ is the second largest eigenvalue of $Γ$. We show that if intersection number $c_2\geqslant 2$, then $b\geqslant 1$ and one of the following (i)--(vi) holds: (i) $Γ$ is a regular near $2D$-gon, (ii) $Γ$ is a Johnson graph $J(2D,D)$, (iii) $Γ$ is a halved $\ell$-cube where $\ell \in \{2D,2D+1\}$, (iv) $Γ$ is a folded Johnson graph $\bar{J}(4D,2D)$, (v) $Γ$ is a folded halved $(4D)$-cube, (vi) the valency of $Γ$ is bounded by a function of $b$. Using this result, we characterize $1$-homogeneous graphs with classical parameters and $a_1>0$, as well as tight distance-regular graphs.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Connections between Reachability and Time Optimality
Authors:
Juho Bae,
Ji Hoon Bai,
Byung-Yoon Lee,
Jun-Yong Lee,
Chang-Hun Lee
Abstract:
This paper presents the concept of an equivalence relation between the set of optimal control problems. By leveraging this concept, we show that the boundary of the reachability set can be constructed by the solutions of time optimal problems. Alongside, a more generalized equivalence theorem is presented together. The findings facilitate the use of solution structures from a certain class of opti…
▽ More
This paper presents the concept of an equivalence relation between the set of optimal control problems. By leveraging this concept, we show that the boundary of the reachability set can be constructed by the solutions of time optimal problems. Alongside, a more generalized equivalence theorem is presented together. The findings facilitate the use of solution structures from a certain class of optimal control problems to address problems in corresponding equivalent classes. As a byproduct, we state and prove the construction methods of the reachability sets of three-dimensional curves with prescribed curvature bound. The findings are twofold: Firstly, we prove that any boundary point of the reachability set, with the terminal direction taken into account, can be accessed via curves of H, CSC, CCC, or their respective subsegments, where H denotes a helicoidal arc, C a circular arc with maximum curvature, and S a straight segment. Secondly, we show that any boundary point of the reachability set, without considering the terminal direction, can be accessed by curves of CC, CS, or their respective subsegments. These findings extend the developments presented in literature regarding planar curves, or Dubins car dynamics, into spatial curves in $\mathbb{R}^3$. For higher dimensions, we confirm that the problem of identifying the reachability set of curvature bounded paths subsumes the well-known Markov-Dubins problem. These advancements in understanding the reachability of curvature bounded paths in $\mathbb{R}^3$ hold significant practical implications, particularly in the contexts of mission planning problems and time optimal guidance.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Isoperimetric profile function comparisons with Integral Ricci curvature bounds
Authors:
Jihye Lee,
Fabio Ricci
Abstract:
We prove comparison results for the Isoperimetric profile function in the setting of manifolds with integral bounds on the Ricci curvature. We extend previous work of Ni and Wang and Bayle and Rosales under the usual pointwise bounds for the Ricci curvature.
We prove comparison results for the Isoperimetric profile function in the setting of manifolds with integral bounds on the Ricci curvature. We extend previous work of Ni and Wang and Bayle and Rosales under the usual pointwise bounds for the Ricci curvature.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
The Johnson-Mercier elasticity element in any dimensions
Authors:
Jay Gopalakrishnan,
Johnny Guzman,
Jeonghun J. Lee
Abstract:
Mixed methods for linear elasticity with strongly symmetric stresses of lowest order are studied in this paper. On each simplex, the stress space has piecewise linear components with respect to its Alfeld split (which connects the vertices to barycenter), generalizing the Johnson-Mercier two-dimensional element to higher dimensions. Further reductions in the stress space in the three-dimensional c…
▽ More
Mixed methods for linear elasticity with strongly symmetric stresses of lowest order are studied in this paper. On each simplex, the stress space has piecewise linear components with respect to its Alfeld split (which connects the vertices to barycenter), generalizing the Johnson-Mercier two-dimensional element to higher dimensions. Further reductions in the stress space in the three-dimensional case (to 24 degrees of freedom per tetrahedron) are possible when the displacement space is reduced to local rigid displacements. Proofs of optimal error estimates of numerical solutions and improved error estimates via postprocessing and the duality argument are presented.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
How Much Can Reconfigurable Intelligent Surfaces Augment Sky Visibility: A Stochastic Geometry Approach
Authors:
Junse Lee,
Francois Baccelli
Abstract:
This paper uses the theory of point processes and stochastic geometry to quantify the sky visibility experienced by users located in an urban environment. The general idea is to represent the buildings of this environment as a stationary marked point process, where the points represent the building locations and the marks their heights. The point process framework is first used to characterize the…
▽ More
This paper uses the theory of point processes and stochastic geometry to quantify the sky visibility experienced by users located in an urban environment. The general idea is to represent the buildings of this environment as a stationary marked point process, where the points represent the building locations and the marks their heights. The point process framework is first used to characterize the distribution of the blockage angle, which limits the visibility of a typical user into the sky due to the obstruction by buildings. In the context of communications, this distribution is useful when users try to connect to the nodes of an aerial or non-terrestrial network in a Line-of-Sight way. Within this context, the point process framework can also be used to investigate the gain of connectivity obtained thanks to Reconfigurable Intelligent Surfaces. Assuming that such surfaces are installed on the top of buildings to extend the user's sky visibility, this point process approach allows one to quantify the gain in visibility and hence the gain in connectivity obtained by the typical user. The distributional properties of visibility-related metrics are cross-validated by comparison to simulation results.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
A converse of dynamical Mordell--Lang conjecture in positive characteristic
Authors:
Jungin Lee,
Gyeonghyeon Nam
Abstract:
In this paper, we prove the converse of the dynamical Mordell--Lang conjecture in positive characteristic: For every subset $S \subseteq \mathbb{N}_0$ which is a union of finitely many arithmetic progressions along with finitely many $p$-sets of the form $\left \{ \sum_{j=1}^{m} c_j p^{k_jn_j} : n_j \in \mathbb{N}_0 \right \}$ ($c_j \in \mathbb{Q}$, $k_j \in \mathbb{N}_0$), there exist a split tor…
▽ More
In this paper, we prove the converse of the dynamical Mordell--Lang conjecture in positive characteristic: For every subset $S \subseteq \mathbb{N}_0$ which is a union of finitely many arithmetic progressions along with finitely many $p$-sets of the form $\left \{ \sum_{j=1}^{m} c_j p^{k_jn_j} : n_j \in \mathbb{N}_0 \right \}$ ($c_j \in \mathbb{Q}$, $k_j \in \mathbb{N}_0$), there exist a split torus $X = \mathbb{G}_m^k$ defined over $K=\overline{\mathbb{F}_p}(t)$, an endomorphism $Φ$ of $X$, $α\in X(K)$ and a closed subvariety $V \subseteq X$ such that $\left \{ n \in \mathbb{N}_0 : Φ^n(α) \in V(K) \right \} = S$.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
How Well Can Transformers Emulate In-context Newton's Method?
Authors:
Angeliki Giannou,
Liu Yang,
Tianhao Wang,
Dimitris Papailiopoulos,
Jason D. Lee
Abstract:
Transformer-based models have demonstrated remarkable in-context learning capabilities, prompting extensive research into its underlying mechanisms. Recent studies have suggested that Transformers can implement first-order optimization algorithms for in-context learning and even second order ones for the case of linear regression. In this work, we study whether Transformers can perform higher orde…
▽ More
Transformer-based models have demonstrated remarkable in-context learning capabilities, prompting extensive research into its underlying mechanisms. Recent studies have suggested that Transformers can implement first-order optimization algorithms for in-context learning and even second order ones for the case of linear regression. In this work, we study whether Transformers can perform higher order optimization methods, beyond the case of linear regression. We establish that linear attention Transformers with ReLU layers can approximate second order optimization algorithms for the task of logistic regression and achieve $ε$ error with only a logarithmic to the error more layers. As a by-product we demonstrate the ability of even linear attention-only Transformers in implementing a single step of Newton's iteration for matrix inversion with merely two layers. These results suggest the ability of the Transformer architecture to implement complex algorithms, beyond gradient descent.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Structure-Preserving Operator Learning: Modeling the Collision Operator of Kinetic Equations
Authors:
Jae Yong Lee,
Steffen Schotthöfer,
Tianbai Xiao,
Sebastian Krumscheid,
Martin Frank
Abstract:
This work explores the application of deep operator learning principles to a problem in statistical physics. Specifically, we consider the linear kinetic equation, consisting of a differential advection operator and an integral collision operator, which is a powerful yet expensive mathematical model for interacting particle systems with ample applications, e.g., in radiation transport. We investig…
▽ More
This work explores the application of deep operator learning principles to a problem in statistical physics. Specifically, we consider the linear kinetic equation, consisting of a differential advection operator and an integral collision operator, which is a powerful yet expensive mathematical model for interacting particle systems with ample applications, e.g., in radiation transport. We investigate the capabilities of the Deep Operator network (DeepONet) approach to modelling the high dimensional collision operator of the linear kinetic equation. This integral operator has crucial analytical structures that a surrogate model, e.g., a DeepONet, needs to preserve to enable meaningful physical simulation. We propose several DeepONet modifications to encapsulate essential structural properties of this integral operator in a DeepONet model. To be precise, we adapt the architecture of the trunk-net so the DeepONet has the same collision invariants as the theoretical kinetic collision operator, thus preserving conserved quantities, e.g., mass, of the modeled many-particle system. Further, we propose an entropy-inspired data-sampling method tailored to train the modified DeepONet surrogates without requiring an excessive expensive simulation-based data generation.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Parameter-Free Algorithms for Performative Regret Minimization under Decision-Dependent Distributions
Authors:
Sungwoo Park,
Junyeop Kwon,
Byeongnoh Kim,
Suhyun Chae,
Jeeyong Lee,
Dabeen Lee
Abstract:
This paper studies performative risk minimization, a formulation of stochastic optimization under decision-dependent distributions. We consider the general case where the performative risk can be non-convex, for which we develop efficient parameter-free optimistic optimization-based methods. Our algorithms significantly improve upon the existing Lipschitz bandit-based method in many aspects. In pa…
▽ More
This paper studies performative risk minimization, a formulation of stochastic optimization under decision-dependent distributions. We consider the general case where the performative risk can be non-convex, for which we develop efficient parameter-free optimistic optimization-based methods. Our algorithms significantly improve upon the existing Lipschitz bandit-based method in many aspects. In particular, our framework does not require knowledge about the sensitivity parameter of the distribution map and the Lipshitz constant of the loss function. This makes our framework practically favorable, together with the efficient optimistic optimization-based tree-search mechanism. We provide experimental results that demonstrate the numerical superiority of our algorithms over the existing method and other black-box optimistic optimization methods.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Fluctuations of the free energy of the spherical Sherrington-Kirkpatrick model with heavy-tailed interaction
Authors:
Taegyun Kim,
Ji Oon Lee
Abstract:
We consider the 2-spin spherical Sherrington--Kirkpatrick model where the interactions between the spins are given as random variables with heavy-tailed distribution. We prove that the free energy exhibits sharp phase transition, depending on the location of the largest eigenvalue of the interaction matrix. We also prove the order of the limiting free energy and the limiting distribution of the fl…
▽ More
We consider the 2-spin spherical Sherrington--Kirkpatrick model where the interactions between the spins are given as random variables with heavy-tailed distribution. We prove that the free energy exhibits sharp phase transition, depending on the location of the largest eigenvalue of the interaction matrix. We also prove the order of the limiting free energy and the limiting distribution of the fluctuation of the free energy for both regimes.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
LoRA Training in the NTK Regime has No Spurious Local Minima
Authors:
Uijeong Jang,
Jason D. Lee,
Ernest K. Ryu
Abstract:
Low-rank adaptation (LoRA) has become the standard approach for parameter-efficient fine-tuning of large language models (LLM), but our theoretical understanding of LoRA has been limited. In this work, we theoretically analyze LoRA fine-tuning in the neural tangent kernel (NTK) regime with $N$ data points, showing: (i) full fine-tuning (without LoRA) admits a low-rank solution of rank…
▽ More
Low-rank adaptation (LoRA) has become the standard approach for parameter-efficient fine-tuning of large language models (LLM), but our theoretical understanding of LoRA has been limited. In this work, we theoretically analyze LoRA fine-tuning in the neural tangent kernel (NTK) regime with $N$ data points, showing: (i) full fine-tuning (without LoRA) admits a low-rank solution of rank $r\lesssim \sqrt{N}$; (ii) using LoRA with rank $r\gtrsim \sqrt{N}$ eliminates spurious local minima, allowing gradient descent to find the low-rank solutions; (iii) the low-rank solution found using LoRA generalizes well.
△ Less
Submitted 28 May, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Relativized Galois groups of first order theories over a hyperimaginary
Authors:
Hyoyoon Lee,
Junguk Lee
Abstract:
We study relativized Lascar groups, which are formed by relativizing Lascar groups to the solution set of a partial type $Σ$. We introduce the notion of a Lascar tuple for $Σ$ and by considering the space of types over a Lascar tuple for $Σ$, the topology for a relativized Lascar group is (re-)defined and some fundamental facts about the Galois groups of first-order theories are generalized to the…
▽ More
We study relativized Lascar groups, which are formed by relativizing Lascar groups to the solution set of a partial type $Σ$. We introduce the notion of a Lascar tuple for $Σ$ and by considering the space of types over a Lascar tuple for $Σ$, the topology for a relativized Lascar group is (re-)defined and some fundamental facts about the Galois groups of first-order theories are generalized to the relativized context. In particular, we prove that any closed subgroup of a relativized Lascar group corresponds to a stabilizer of a bounded hyperimaginary having at least one representative in the solution set of the given partial type $Σ$. Using this, we find the correspondence between subgroups of the relativized Lascar group and the relativized strong types. We also compare the relativized notion with the restricted one, and provide a condition when two notions coincide.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Fundamental Benefit of Alternating Updates in Minimax Optimization
Authors:
Jaewook Lee,
Hanseul Cho,
Chulhee Yun
Abstract:
The Gradient Descent-Ascent (GDA) algorithm, designed to solve minimax optimization problems, takes the descent and ascent steps either simultaneously (Sim-GDA) or alternately (Alt-GDA). While Alt-GDA is commonly observed to converge faster, the performance gap between the two is not yet well understood theoretically, especially in terms of global convergence rates. To address this theory-practice…
▽ More
The Gradient Descent-Ascent (GDA) algorithm, designed to solve minimax optimization problems, takes the descent and ascent steps either simultaneously (Sim-GDA) or alternately (Alt-GDA). While Alt-GDA is commonly observed to converge faster, the performance gap between the two is not yet well understood theoretically, especially in terms of global convergence rates. To address this theory-practice gap, we present fine-grained convergence analyses of both algorithms for strongly-convex-strongly-concave and Lipschitz-gradient objectives. Our new iteration complexity upper bound of Alt-GDA is strictly smaller than the lower bound of Sim-GDA; i.e., Alt-GDA is provably faster. Moreover, we propose Alternating-Extrapolation GDA (Alex-GDA), a general algorithmic framework that subsumes Sim-GDA and Alt-GDA, for which the main idea is to alternately take gradients from extrapolations of the iterates. We show that Alex-GDA satisfies a smaller iteration complexity bound, identical to that of the Extra-gradient method, while requiring less gradient computations. We also prove that Alex-GDA enjoys linear convergence for bilinear problems, for which both Sim-GDA and Alt-GDA fail to converge at all.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Learning time-dependent PDE via graph neural networks and deep operator network for robust accuracy on irregular grids
Authors:
Sung Woong Cho,
Jae Yong Lee,
Hyung Ju Hwang
Abstract:
Scientific computing using deep learning has seen significant advancements in recent years. There has been growing interest in models that learn the operator from the parameters of a partial differential equation (PDE) to the corresponding solutions. Deep Operator Network (DeepONet) and Fourier Neural operator, among other models, have been designed with structures suitable for handling functions…
▽ More
Scientific computing using deep learning has seen significant advancements in recent years. There has been growing interest in models that learn the operator from the parameters of a partial differential equation (PDE) to the corresponding solutions. Deep Operator Network (DeepONet) and Fourier Neural operator, among other models, have been designed with structures suitable for handling functions as inputs and outputs, enabling real-time predictions as surrogate models for solution operators. There has also been significant progress in the research on surrogate models based on graph neural networks (GNNs), specifically targeting the dynamics in time-dependent PDEs. In this paper, we propose GraphDeepONet, an autoregressive model based on GNNs, to effectively adapt DeepONet, which is well-known for successful operator learning. GraphDeepONet exhibits robust accuracy in predicting solutions compared to existing GNN-based PDE solver models. It maintains consistent performance even on irregular grids, leveraging the advantages inherited from DeepONet and enabling predictions on arbitrary grids. Additionally, unlike traditional DeepONet and its variants, GraphDeepONet enables time extrapolation for time-dependent PDE solutions. We also provide theoretical analysis of the universal approximation capability of GraphDeepONet in approximating continuous operators across arbitrary time intervals.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Metastability and time scales for parabolic equations with drift 2: the general time scale
Authors:
Claudio Landim,
Jungkyoung Lee,
Insuk Seo
Abstract:
Consider the elliptic operator given by \[ \mathscr{L}_εf=b\cdot\nabla f+εΔf \] for some smooth vector field $b:\mathbb{R}^d\to\mathbb{R}^d$ and $ε>0$, and the initial-valued problem on $\mathbb{R}^d$ \[ \left\{\begin{aligned}&\partial_t u_ε=\mathscr{L}_εu_ε,\\ &u_ε(0,\,\cdot)=u_0(\cdot), \end{aligned} \right. \] for some bounded continuous function $u_0$. Under the hypothesis that the diffusion o…
▽ More
Consider the elliptic operator given by \[ \mathscr{L}_εf=b\cdot\nabla f+εΔf \] for some smooth vector field $b:\mathbb{R}^d\to\mathbb{R}^d$ and $ε>0$, and the initial-valued problem on $\mathbb{R}^d$ \[ \left\{\begin{aligned}&\partial_t u_ε=\mathscr{L}_εu_ε,\\ &u_ε(0,\,\cdot)=u_0(\cdot), \end{aligned} \right. \] for some bounded continuous function $u_0$. Under the hypothesis that the diffusion on $\mathbb{R}^d$ induced by $\mathscr{L}_ε$ has a Gibbs invariant measure of the form $\exp \{-U(x)/ε\}dx$ for some smooth Morse potential function $U$, we provide the complete characterization of the multi-scale behavior of the solution $u_ε$ in the regime $ε\to0$. More precisely, we find the critical time scales $1\ll θ_ε^{(1)}\ll\cdots\ll θ_ε^{(q)}$ as $ε\to0$, and the kernels $R_t^{(p)}:M_0\times M_0\to\mathbb{R}_+$, where $M_0$ denotes the set of local minima of $U$, such that \[ \lim_{ε\to0}u_ε(tθ_ε^{(p)},\,x)=\sum_{m'\in M_0}R_t^{(p)}(m,\,m')u_0(m'), \] for all $t>0$ and $x$ in the domain of attraction of $m$ for the dynamical system $\dot{x}(t)=b(x(t))$. We then complete the characterization of the solution $u_ε$ by computing the exact asymptotic limit of the solution between time scales
$θ_ε^{(p)}$ and $θ_ε^{(p+1)}$ for each $p$, where $θ_ε^{(0)}=1$ and $θ_ε^{(q+1)}=\infty$. Our proof relies on the full tree-structure characterization of the metastable behavior in different time-scales of the diffusion induced by $\mathscr{L}_ε$. This result can be regarded as the precise refinement of Freidlin-Wentzell theory which was not known for more than a half century.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
The Powell Conjecture for the genus-three Heegaard splitting of the $3$-sphere
Authors:
Sangbum Cho,
Yuya Koda,
Jung Hoon Lee
Abstract:
The Powell Conjecture states that four specific elements suffice to generate the Goeritz group of the Heegaard splitting of the $3$-sphere. We present an alternative proof of the Powell Conjecture when the genus of the splitting is $3$, and suggest a strategy for the case of higher genera.
The Powell Conjecture states that four specific elements suffice to generate the Goeritz group of the Heegaard splitting of the $3$-sphere. We present an alternative proof of the Powell Conjecture when the genus of the splitting is $3$, and suggest a strategy for the case of higher genera.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Generators for the cohomology of the moduli space of irregular parabolic Higgs bundles
Authors:
Jia Choon Lee,
Sukjoo Lee
Abstract:
We prove that the pure part of the cohomology ring of the moduli space of irregular $\underlineξ$-parabolic Higgs bundles is generated by the Künneth components of the Chern classes of a universal bundle and the Chern classes of the successive quotients of a universal flag of subbundles. As an application, in the regular full-flag case, we demonstrate a similar result for the cohomology ring of th…
▽ More
We prove that the pure part of the cohomology ring of the moduli space of irregular $\underlineξ$-parabolic Higgs bundles is generated by the Künneth components of the Chern classes of a universal bundle and the Chern classes of the successive quotients of a universal flag of subbundles. As an application, in the regular full-flag case, we demonstrate a similar result for the cohomology ring of the moduli spaces of parabolic and strongly parabolic Higgs bundles.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Multiplicative Thom-Sebastiani for Bernstein-Sato polynomials
Authors:
Jonghyun Lee
Abstract:
We show that if $f\in \mathcal{O}_X(X)$ and $g\in \mathcal{O}_Y(Y)$ are nonzero regular functions on smooth complex algebraic varieties $X$ and $Y$, then the Bernstein-Sato polynomial of the product function $fg \in \mathcal{O}_{X\times Y}(X \times Y)$ is given by $b_{fg}(s)=b_f(s)b_g(s)$. This answers a question of Budur in \cite{Bud12} and of Popa in \cite{Pop21}.
We show that if $f\in \mathcal{O}_X(X)$ and $g\in \mathcal{O}_Y(Y)$ are nonzero regular functions on smooth complex algebraic varieties $X$ and $Y$, then the Bernstein-Sato polynomial of the product function $fg \in \mathcal{O}_{X\times Y}(X \times Y)$ is given by $b_{fg}(s)=b_f(s)b_g(s)$. This answers a question of Budur in \cite{Bud12} and of Popa in \cite{Pop21}.
△ Less
Submitted 14 June, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Minimal grid diagrams of the prime knots with crossing number 13 and arc index 13
Authors:
Hwa Jeong Lee,
Yoonsang Lee,
Chanmin Lee,
Yeseo Park,
Hun Kim,
Gyo Taek **
Abstract:
We give a list of minimal grid diagrams of the 13 crossing prime nonalternating knots which have arc index 13. There are 9,988 prime knots with crossing number 13. Among them 4,878 are alternating and have arc index 15. Among the other nonalternating knots, 49, 399, 1,412 and 3,250 have arc index 10, 11, 12, and 13, respectively. We used the Dowker-Thistlethwaite code of the 3,250 knots provided b…
▽ More
We give a list of minimal grid diagrams of the 13 crossing prime nonalternating knots which have arc index 13. There are 9,988 prime knots with crossing number 13. Among them 4,878 are alternating and have arc index 15. Among the other nonalternating knots, 49, 399, 1,412 and 3,250 have arc index 10, 11, 12, and 13, respectively. We used the Dowker-Thistlethwaite code of the 3,250 knots provided by the program Knotscape to generate spanning trees of the corresponding knot diagrams to obtain minimal arc presentations in the form of grid diagrams.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
On the Hardness of Short and Sign-Compatible Circuit Walks
Authors:
Steffen Borgwardt,
Weston Grewe,
Sean Kafer,
Jon Lee,
Laura Sanità
Abstract:
The circuits of a polyhedron are a superset of its edge directions. Circuit walks, a sequence of steps along circuits, generalize edge walks and are "short" if they have few steps or small total length. Both interpretations of short are relevant to the theory and application of linear programming.
We study the hardness of several problems relating to the construction of short circuit walks. We e…
▽ More
The circuits of a polyhedron are a superset of its edge directions. Circuit walks, a sequence of steps along circuits, generalize edge walks and are "short" if they have few steps or small total length. Both interpretations of short are relevant to the theory and application of linear programming.
We study the hardness of several problems relating to the construction of short circuit walks. We establish that for a pair of vertices of a $0/1$-network-flow polytope, it is NP-complete to determine the length of a shortest circuit walk, even if we add the requirement that the walk must be sign-compatible. Our results also imply that determining the minimal number of circuits needed for a sign-compatible decomposition is NP-complete. Further, we show that it is NP-complete to determine the smallest total length (for $p$-norms $\lVert \cdot \rVert_p$, $1 < p \leq \infty$) of a circuit walk between a pair of vertices. One method to construct a short circuit walk is to pick up a correct facet at each step, which generalizes a non-revisiting walk. We prove that it is NP-complete to determine if there is a circuit direction that picks up a correct facet; in contrast, this problem can be solved in polynomial time for TU polyhedra.
△ Less
Submitted 8 March, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Good and Fast Row-Sparse ah-Symmetric Reflexive Generalized Inverses
Authors:
Gabriel Ponte,
Marcia Fampa,
Jon Lee,
Luze Xu
Abstract:
We present several algorithms aimed at constructing sparse and structured sparse (row-sparse) generalized inverses, with application to the efficient computation of least-squares solutions, for inconsistent systems of linear equations, in the setting of multiple right-hand sides and a rank-deficient constraint matrix. Leveraging our earlier formulations to minimize the 1- and 2,1- norms of general…
▽ More
We present several algorithms aimed at constructing sparse and structured sparse (row-sparse) generalized inverses, with application to the efficient computation of least-squares solutions, for inconsistent systems of linear equations, in the setting of multiple right-hand sides and a rank-deficient constraint matrix. Leveraging our earlier formulations to minimize the 1- and 2,1- norms of generalized inverses that satisfy important properties of the Moore-Penrose pseudoinverse, we develop efficient and scalable ADMM algorithms to address these norm-minimization problems and to limit the number of nonzero rows in the solution. We establish a 2,1-norm approximation result for a local-search procedure that was originally designed for 1-norm minimization, and we compare the ADMM algorithms with the local-search procedure and with general-purpose optimization solvers.
△ Less
Submitted 25 June, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
A parameter-free approach for solving SOS-convex semi-algebraic fractional programs
Authors:
Chengmiao Yang,
Liguo Jiao,
Jae Hyoung Lee
Abstract:
In this paper, we study a class of nonsmooth fractional programs {\rm (FP, for short)} with SOS-convex semi-algebraic functions. Under suitable assumptions, we derive a strong duality result between the problem (FP) and its semidefinite programming (SDP) relaxations. Remarkably, we extract an optimal solution of the problem (FP) by solving one and only one associated SDP problem. Numerical example…
▽ More
In this paper, we study a class of nonsmooth fractional programs {\rm (FP, for short)} with SOS-convex semi-algebraic functions. Under suitable assumptions, we derive a strong duality result between the problem (FP) and its semidefinite programming (SDP) relaxations. Remarkably, we extract an optimal solution of the problem (FP) by solving one and only one associated SDP problem. Numerical examples are also given.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Well-posedness, magnetic helicity conservation, inviscid limit and asymptotic stability for the generalized Navier-Stokes-Maxwell equations
Authors:
Kyungkeun Kang,
Jihoon Lee,
Dinh Duong Nguyen
Abstract:
This paper is devoted to studying the well-posedness, conservation of magnetic helicity, inviscid limit and asymptotic stability of the generalized Navier-Stokes-Maxwell (NSM) equations with the standard Ohm's law in $\mathbb{R}^d$ for $d \in \{2,3\}$. More precisely, the global well-posedness is established in case of fractional Laplacian velocity $(-Δ)^αv$ with $α= \frac{d}{2}$ for suitable data…
▽ More
This paper is devoted to studying the well-posedness, conservation of magnetic helicity, inviscid limit and asymptotic stability of the generalized Navier-Stokes-Maxwell (NSM) equations with the standard Ohm's law in $\mathbb{R}^d$ for $d \in \{2,3\}$. More precisely, the global well-posedness is established in case of fractional Laplacian velocity $(-Δ)^αv$ with $α= \frac{d}{2}$ for suitable data. In addition, the local well-posedness in the inviscid case is also provided for sufficient smooth data, which allows us to study the inviscid limit of associated positive viscosity solutions in the case $α= 1$, where an explicit bound on the difference is given. Furthermore, in three dimensions if the initial data satisfies futher suitable conditions then magnetic helicity is conserved as the electric conductivity goes to infinity. On the other hand, in the case $α= 0$ the stability near a magnetohydrostatic equilibrium with a constant (or equivalently bounded) magnetic field is also obtained in which nonhomogeneous Sobolev norms of the velocity and electric fields, and for $p \in (2,\infty]$ the $L^p$ norm of the magnetic field converge to zero as time goes to infinity with an implicit rate. In this velocity dam** case, the situation is different both in case of the two and a half, and three-dimensional (Hall)-magnetohydrodynamics ((H)-MHD) system, where an explicit rate of convergence in infinite time is computed for both the velocity and magnetic fields in nonhomogeneous Sobolev norms. Therefore, it seems that there is a gap between NSM and MHD in terms of the norm convergence of the magnetic field and the rate of decaying in time, even the latter equations can be proved as a limiting system of the former one in the sense of distributions as the speed of light tends to infinity.
△ Less
Submitted 15 June, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.