Search | arXiv e-print repository

Exploiting Structure in Quantum Relative Entropy Programs

Authors: Kerry He, James Saunderson, Hamza Fawzi

Abstract: Quantum relative entropy programs are convex optimization problems which minimize a linear functional over an affine section of the epigraph of the quantum relative entropy function. Recently, the self-concordance of a natural barrier function was proved for this set. This has opened up the opportunity to use interior-point methods for nonsymmetric cone programs to solve these optimization problem… ▽ More Quantum relative entropy programs are convex optimization problems which minimize a linear functional over an affine section of the epigraph of the quantum relative entropy function. Recently, the self-concordance of a natural barrier function was proved for this set. This has opened up the opportunity to use interior-point methods for nonsymmetric cone programs to solve these optimization problems. In this paper, we show how common structures arising from applications in quantum information theory can be exploited to improve the efficiency of solving quantum relative entropy programs using interior-point methods. First, we show that the natural barrier function for the epigraph of the quantum relative entropy composed with positive linear operators is optimally self-concordant, even when these linear operators map to singular matrices. Second, we show how we can exploit a catalogue of common structures in these linear operators to compute the inverse Hessian products of the barrier function more efficiently. This step is typically the bottleneck when solving quantum relative entropy programs using interior-point methods, and therefore improving the efficiency of this step can significantly improve the computational performance of the algorithm. We demonstrate how these methods can be applied to important applications in quantum information theory, including quantum key distribution, quantum rate-distortion, quantum channel capacities, and estimating the ground state energy of Hamiltonians. Our numerical results show that these techniques improve computation times by up to several orders of magnitude, and allow previously intractable problems to be solved. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: 36 pages, 8 tables

arXiv:2404.09250 [pdf, ps, other]

A norm inequality on noncommutative symmetric spaces related to a question of Bourin

Authors: **chen Liu, Kan He, Xingpeng Zhao

Abstract: In this note, we study a question introduced by Bourin \cite{2009Matrix} and partially solve the question of Bourin. In fact, for t\in[0,\frac{1}{4}]\cup[\frac{3}{4},1], we show that |||x^{t}y^{1-t}+y^{t}x^{1-t}|||\leq|||x+y|||, where x,y\in\mathbb{M}_{n}(\mathbb{C})^+ and \||\cdot\|| is the unitarily invariant norm. Moreover, we prove that the above inequality holds on noncommutative fully symmet… ▽ More In this note, we study a question introduced by Bourin \cite{2009Matrix} and partially solve the question of Bourin. In fact, for t\in[0,\frac{1}{4}]\cup[\frac{3}{4},1], we show that |||x^{t}y^{1-t}+y^{t}x^{1-t}|||\leq|||x+y|||, where x,y\in\mathbb{M}_{n}(\mathbb{C})^+ and \||\cdot\|| is the unitarily invariant norm. Moreover, we prove that the above inequality holds on noncommutative fully symmetric spaces. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2402.05438 [pdf, other]

Penalized spline estimation of principal components for sparse functional data: rates of convergence

Authors: Shiyuan He, Jianhua Z. Huang, Kejun He

Abstract: This paper gives a comprehensive treatment of the convergence rates of penalized spline estimators for simultaneously estimating several leading principal component functions, when the functional data is sparsely observed. The penalized spline estimators are defined as the solution of a penalized empirical risk minimization problem, where the loss function belongs to a general class of loss functi… ▽ More This paper gives a comprehensive treatment of the convergence rates of penalized spline estimators for simultaneously estimating several leading principal component functions, when the functional data is sparsely observed. The penalized spline estimators are defined as the solution of a penalized empirical risk minimization problem, where the loss function belongs to a general class of loss functions motivated by the matrix Bregman divergence, and the penalty term is the integrated squared derivative. The theory reveals that the asymptotic behavior of penalized spline estimators depends on the interesting interplay between several factors, i.e., the smoothness of the unknown functions, the spline degree, the spline knot number, the penalty order, and the penalty parameter. The theory also classifies the asymptotic behavior into seven scenarios and characterizes whether and how the minimax optimal rates of convergence are achievable in each scenario. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2309.15919 [pdf, other]

doi 10.22331/q-2024-04-09-1314

Efficient Computation of the Quantum Rate-Distortion Function

Authors: Kerry He, James Saunderson, Hamza Fawzi

Abstract: The quantum rate-distortion function plays a fundamental role in quantum information theory, however there is currently no practical algorithm which can efficiently compute this function to high accuracy for moderate channel dimensions. In this paper, we show how symmetry reduction can significantly simplify common instances of the entanglement-assisted quantum rate-distortion problems. This allow… ▽ More The quantum rate-distortion function plays a fundamental role in quantum information theory, however there is currently no practical algorithm which can efficiently compute this function to high accuracy for moderate channel dimensions. In this paper, we show how symmetry reduction can significantly simplify common instances of the entanglement-assisted quantum rate-distortion problems. This allows us to better understand the properties of the quantum channels which obtain the optimal rate-distortion trade-off, while also allowing for more efficient computation of the quantum rate-distortion function regardless of the numerical algorithm being used. Additionally, we propose an inexact variant of the mirror descent algorithm to compute the quantum rate-distortion function with provable sublinear convergence rates. We show how this mirror descent algorithm is related to Blahut-Arimoto and expectation-maximization methods previously used to solve similar problems in information theory. Using these techniques, we present the first numerical experiments to compute a multi-qubit quantum rate-distortion function, and show that our proposed algorithm solves faster and to higher accuracy when compared to existing methods. △ Less

Submitted 2 April, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

Comments: 37 pages, 2 figures, 2 tables. v2: Minor edits to introduction, abstract, and notation. v3: Changes based on reviewer comments, changed to Quantum template

Journal ref: Quantum 8, 1314 (2024)

arXiv:2306.04492 [pdf, ps, other]

A Bregman Proximal Perspective on Classical and Quantum Blahut-Arimoto Algorithms

Authors: Kerry He, James Saunderson, Hamza Fawzi

Abstract: The Blahut-Arimoto algorithm is a well-known method to compute classical channel capacities and rate-distortion functions. Recent works have extended this algorithm to compute various quantum analogs of these quantities. In this paper, we show how these Blahut-Arimoto algorithms are special instances of mirror descent, which is a type of Bregman proximal method, and a well-studied generalization o… ▽ More The Blahut-Arimoto algorithm is a well-known method to compute classical channel capacities and rate-distortion functions. Recent works have extended this algorithm to compute various quantum analogs of these quantities. In this paper, we show how these Blahut-Arimoto algorithms are special instances of mirror descent, which is a type of Bregman proximal method, and a well-studied generalization of gradient descent for constrained convex optimization. Using recently developed convex analysis tools, we show how analysis based on relative smoothness and strong convexity recovers known sublinear and linear convergence rates for Blahut-Arimoto algorithms. This Bregman proximal viewpoint allows us to derive related algorithms with similar convergence guarantees to solve problems in information theory for which Blahut-Arimoto-type algorithms are not directly applicable. We apply this framework to compute energy-constrained classical and quantum channel capacities, classical and quantum rate-distortion functions, and approximations of the relative entropy of entanglement, all with provable convergence guarantees. △ Less

Submitted 7 June, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: 30 pages. v2: Revised introduction and numerical experiments; strengthened proof for Theorem 4.7; other minor edits throughout. v3: Accepted into IEEE Transactions of Information Theory

arXiv:2212.14847 [pdf, ps, other]

Deterministic counting Lovász local lemma beyond linear programming

Authors: Kun He, Chunyang Wang, Yitong Yin

Abstract: We give a simple combinatorial algorithm to deterministically approximately count the number of satisfying assignments of general constraint satisfaction problems (CSPs). Suppose that the CSP has domain size $q=O(1)$, each constraint contains at most $k=O(1)$ variables, shares variables with at most $Δ=O(1)$ constraints, and is violated with probability at most $p$ by a uniform random assignment.… ▽ More We give a simple combinatorial algorithm to deterministically approximately count the number of satisfying assignments of general constraint satisfaction problems (CSPs). Suppose that the CSP has domain size $q=O(1)$, each constraint contains at most $k=O(1)$ variables, shares variables with at most $Δ=O(1)$ constraints, and is violated with probability at most $p$ by a uniform random assignment. The algorithm returns in polynomial time in an improved local lemma regime: \[ q^2\cdot k\cdot p\cdotΔ^5\le C_0\quad\text{for a suitably small absolute constant }C_0. \] Here the key term $Δ^5$ improves the previously best known $Δ^7$ for general CSPs [JPV21b] and $Δ^{5.714}$ for the special case of $k$-CNF [JPV21a, HSW21]. Our deterministic counting algorithm is a derandomization of the very recent fast sampling algorithm in [HWY22]. It departs substantially from all previous deterministic counting Lovász local lemma algorithms which relied on linear programming, and gives a deterministic approximate counting algorithm that straightforwardly derandomizes a fast sampling algorithm, hence unifying the fast sampling and deterministic approximate counting in the same algorithmic framework. To obtain the improved regime, in our analysis we develop a refinement of the $\{2,3\}$-trees that were used in the previous analyses of counting/sampling LLL. Similar techniques can be applied to the previous LP-based algorithms to obtain the same improved regime and may be of independent interests. △ Less

Submitted 30 December, 2022; originally announced December 2022.

Comments: Accepted to SODA 2023. arXiv admin note: text overlap with arXiv:2204.01520

arXiv:2211.04874 [pdf, other]

A Unified Analysis of Multi-task Functional Linear Regression Models with Manifold Constraint and Composite Quadratic Penalty

Authors: Shiyuan He, Hanxuan Ye, Kejun He

Abstract: This work studies the multi-task functional linear regression models where both the covariates and the unknown regression coefficients (called slope functions) are curves. For slope function estimation, we employ penalized splines to balance bias, variance, and computational complexity. The power of multi-task learning is brought in by imposing additional structures over the slope functions. We pr… ▽ More This work studies the multi-task functional linear regression models where both the covariates and the unknown regression coefficients (called slope functions) are curves. For slope function estimation, we employ penalized splines to balance bias, variance, and computational complexity. The power of multi-task learning is brought in by imposing additional structures over the slope functions. We propose a general model with double regularization over the spline coefficient matrix: i) a matrix manifold constraint, and ii) a composite penalty as a summation of quadratic terms. Many multi-task learning approaches can be treated as special cases of this proposed model, such as a reduced-rank model and a graph Laplacian regularized model. We show the composite penalty induces a specific norm, which helps to quantify the manifold curvature and determine the corresponding proper subset in the manifold tangent space. The complexity of tangent space subset is then bridged to the complexity of geodesic neighbor via generic chaining. A unified convergence upper bound is obtained and specifically applied to the reduced-rank model and the graph Laplacian regularized model. The phase transition behaviors for the estimators are examined as we vary the configurations of model parameters. △ Less

Submitted 31 July, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

arXiv:2210.06025 [pdf, other]

Bregman Divergence-Based Data Integration with Application to Polygenic Risk Score (PRS) Heterogeneity Adjustment

Authors: Qinmengge Li, Matthew T. Patrick, Haihan Zhang, Chachrit Khunsriraksakul, Philip E. Stuart, Johann E. Gudjonsson, Rajan Nair, James T. Elder, Dajiang J. Liu, Jian Kang, Lam C. Tsoi, Kevin He

Abstract: Polygenic risk scores (PRS) have recently received much attention for genetics risk prediction. While successful for the Caucasian population, the PRS based on the minority population suffer from small sample sizes, high dimensionality and low signal-to-noise ratios, exacerbating already severe health disparities. Due to population heterogeneity, direct trans-ethnic prediction by utilizing the Cau… ▽ More Polygenic risk scores (PRS) have recently received much attention for genetics risk prediction. While successful for the Caucasian population, the PRS based on the minority population suffer from small sample sizes, high dimensionality and low signal-to-noise ratios, exacerbating already severe health disparities. Due to population heterogeneity, direct trans-ethnic prediction by utilizing the Caucasian model for the minority population also has limited performance. In addition, due to data privacy, the individual genotype data is not accessible for either the Caucasian population or the minority population. To address these challenges, we propose a Bregman divergence-based estimation procedure to measure and optimally balance the information from different populations. The proposed method only requires the use of encrypted summary statistics and improves the PRS performance for ethnic minority groups by incorporating additional information. We provide the asymptotic consistency and weak oracle property for the proposed method. Simulations and real data analyses also show its advantages in prediction and variable selection. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: 35 pages, 6 figures

arXiv:2207.11892 [pdf, ps, other]

Improved Bounds for Sampling Solutions of Random CNF Formulas

Authors: Kun He, Kewen Wu, Kuan Yang

Abstract: Let $Φ$ be a random $k$-CNF formula on $n$ variables and $m$ clauses, where each clause is a disjunction of $k$ literals chosen independently and uniformly. Our goal is to sample an approximately uniform solution of $Φ$ (or equivalently, approximate the partition function of $Φ$). Let $α=m/n$ be the density. The previous best algorithm runs in time $n^{\mathsf{poly}(k,α)}$ for any… ▽ More Let $Φ$ be a random $k$-CNF formula on $n$ variables and $m$ clauses, where each clause is a disjunction of $k$ literals chosen independently and uniformly. Our goal is to sample an approximately uniform solution of $Φ$ (or equivalently, approximate the partition function of $Φ$). Let $α=m/n$ be the density. The previous best algorithm runs in time $n^{\mathsf{poly}(k,α)}$ for any $α\lesssim2^{k/300}$ [Galanis, Goldberg, Guo, and Yang, SIAM J. Comput.'21]. Our result significantly improves both bounds by providing an almost-linear time sampler for any $α\lesssim2^{k/3}$. The density $α$ captures the \emph{average degree} in the random formula. In the worst-case model with bounded \emph{maximum degree}, current best efficient sampler works up to degree bound $2^{k/5}$ [He, Wang, and Yin, FOCS'22 and SODA'23], which is, for the first time, superseded by its average-case counterpart due to our $2^{k/3}$ bound. Our result is the first progress towards establishing the intuition that the solvability of the average-case model (random $k$-CNF formula with bounded average degree) is better than the worst-case model (standard $k$-CNF formula with bounded maximal degree) in terms of sampling solutions. △ Less

Submitted 9 June, 2023; v1 submitted 24 July, 2022; originally announced July 2022.

Comments: 51 pages, all proofs added, and bounds slightly improved

arXiv:2206.00946 [pdf, ps, other]

An efficient thermal lattice Boltzmann method for simulating three-dimensional liquid-vapor phase change

Authors: Jiangxu Huang, Lei Wang, Kun He, Changsheng Huang

Abstract: In this paper, a multiple-relaxation-time lattice Boltzmann (LB) approach is developed for the simulation of three-dimensional (3D) liquid-vapor phase change based on the pseudopotential model. In contrast to some existing 3D thermal LB models for liquid-vapor phase change, the present approach has two advantages: for one thing, the current approach does not require calculating the gradient of vol… ▽ More In this paper, a multiple-relaxation-time lattice Boltzmann (LB) approach is developed for the simulation of three-dimensional (3D) liquid-vapor phase change based on the pseudopotential model. In contrast to some existing 3D thermal LB models for liquid-vapor phase change, the present approach has two advantages: for one thing, the current approach does not require calculating the gradient of volumetric heat capacity [i.e., $\nabla \left( {ρ{c_v}} \right)$], and for another, the current approach is constructed based on the seven discrete velocities in three dimensions (D3Q7), making the current thermal LB model more efficient and easy to implement. Also, based on the scheme proposed by Zhou and He [Phys Fluids 9:1591-1598, 1997], a pressure boundary condition for the D3Q19 lattice is proposed to model the multiphase flow in open systems. The current method is then validated by considering the temperature distribution in a 3D saturated liquid-vapor system, the $d^2$ law and the droplet evaporation on a heated surface. It is observed that the numerical results fit well with the analytical solutions, the results of the finite difference method and the experimental data. Our numerical results indicate that the present approach is reliable and efficient in dealing with the 3D liquid-vapor phase change. △ Less

Submitted 2 June, 2022; originally announced June 2022.

arXiv:2205.00686 [pdf, ps, other]

doi 10.1103/PhysRevE.106.055308

A new thermal lattice Boltzmann model for liquid-vapor phase change

Authors: Lei Wang, Jiangxu Huang, Kun He

Abstract: The lattice Boltzmann method is adopted to solve the liquid-vapor phase change problems in this article. By modifying the collision term for the temperature evolution equation, a new thermal lattice Boltzmann model is constructed. As compared with previous studies, the most striking feature of the present approach is that it could avoid the calculations of both the Laplacian term of temperature [… ▽ More The lattice Boltzmann method is adopted to solve the liquid-vapor phase change problems in this article. By modifying the collision term for the temperature evolution equation, a new thermal lattice Boltzmann model is constructed. As compared with previous studies, the most striking feature of the present approach is that it could avoid the calculations of both the Laplacian term of temperature [$\nabla \cdot \left( {κ\nabla T} \right)$] and the gradient term of heat capacitance [$\nabla \left( {ρ{\rm{c}_v}} \right)$]. In addition, since the present approach adopts a simple linear equilibrium distribution function, it is possible to use the D2Q5 lattice for the two dimensional cases consided here, making it is more efficiency than previous works in which the lattice is usually limited to the D2Q9. This approach is firstly validated by the problems of droplet evaporation in open space and adroplet evaporation on heated surface, and the numerical results show good agreement with the analytical results and the finite difference method. Then it is used to model nucleate boiling problem, and the relationship between detachment bubble diameter and gravity acceleration obtained with the present approach fits well with the reported works. △ Less

Submitted 2 May, 2022; originally announced May 2022.

arXiv:2204.04582 [pdf, ps, other]

Real order total variation with applications to the loss functions in learning schemes

Authors: Pan Liu, Xin Yang Lu, Kunlun He

Abstract: Loss function are an essential part in modern data-driven approach, such as bi-level training scheme and machine learnings. In this paper we propose a loss function consisting of a $r$-order (an)-isotropic total variation semi-norms $TV^r$, $r\in \mathbb{R}^+$, defined via the Riemann-Liouville (R-L) fractional derivative. We focus on studying key theoretical properties, such as the lower semi-con… ▽ More Loss function are an essential part in modern data-driven approach, such as bi-level training scheme and machine learnings. In this paper we propose a loss function consisting of a $r$-order (an)-isotropic total variation semi-norms $TV^r$, $r\in \mathbb{R}^+$, defined via the Riemann-Liouville (R-L) fractional derivative. We focus on studying key theoretical properties, such as the lower semi-continuity and compactness with respect to both the function and the order of derivative $r$, of such loss functions. △ Less

Submitted 9 April, 2022; originally announced April 2022.

MSC Class: 26B30; 94A08; 47J20

arXiv:2112.14356 [pdf, ps, other]

Private Private Information

Authors: Kevin He, Fedor Sandomirskiy, Omer Tamuz

Abstract: A private private information structure delivers information about an unknown state while preserving privacy: An agent's signal contains information about the state but remains independent of others' sensitive or private information. We study how informative such structures can be, and characterize those that are optimal in the sense that they cannot be made more informative without violating priv… ▽ More A private private information structure delivers information about an unknown state while preserving privacy: An agent's signal contains information about the state but remains independent of others' sensitive or private information. We study how informative such structures can be, and characterize those that are optimal in the sense that they cannot be made more informative without violating privacy. We connect our results to fairness in recommendation systems and explore a number of further applications. △ Less

Submitted 9 June, 2023; v1 submitted 28 December, 2021; originally announced December 2021.

arXiv:2107.03932 [pdf, ps, other]

Perfect Sampling for (Atomic) Lovász Local Lemma

Authors: Kun He, Xiaoming Sun, Kewen Wu

Abstract: We give a Markov chain based perfect sampler for uniform sampling solutions of constraint satisfaction problems (CSP). Under some mild Lovász local lemma conditions where each constraint of the CSP has a small number of forbidden local configurations, our algorithm is accurate and efficient: it outputs a perfect uniform random solution and its expected running time is quasilinear in the number of… ▽ More We give a Markov chain based perfect sampler for uniform sampling solutions of constraint satisfaction problems (CSP). Under some mild Lovász local lemma conditions where each constraint of the CSP has a small number of forbidden local configurations, our algorithm is accurate and efficient: it outputs a perfect uniform random solution and its expected running time is quasilinear in the number of variables. Prior to our work, perfect samplers are only shown to exist for CSPs under much more restrictive conditions (Guo, Jerrum, and Liu, JACM'19). Our algorithm has two components: 1. A simple perfect sampling algorithm using bounding chains (Huber, STOC'98; Haggstrom and Nelander, Scandinavian Journal of Statistics'99). This sampler is efficient if each variable domain is small. 2. A simple but powerful state tensorization trick to reduce large domains to smaller ones. This trick is a generalization of state compression (Feng, He, and Yin, STOC'21). The crux of our analysis is a simple information percolation argument which allows us to achieve bounds even beyond current best approximate samplers (Jain, Pham, and Vuong, ArXiv'21). Previous related works either use intricate algorithms or need sophisticated analysis or even both. Thus we view the simplicity of both our algorithm and analysis as a strength of our work. △ Less

Submitted 8 July, 2021; originally announced July 2021.

Comments: 56 pages, 1 table, 5 figures, 9 algorithms

arXiv:2010.08766 [pdf, ps, other]

Tight Lower Complexity Bounds for Strongly Convex Finite-Sum Optimization

Authors: Min Zhang, Yao Shu, Kun He

Abstract: Finite-sum optimization plays an important role in the area of machine learning, and hence has triggered a surge of interest in recent years. To address this optimization problem, various randomized incremental gradient methods have been proposed with guaranteed upper and lower complexity bounds for their convergence. Nonetheless, these lower bounds rely on certain conditions: deterministic optimi… ▽ More Finite-sum optimization plays an important role in the area of machine learning, and hence has triggered a surge of interest in recent years. To address this optimization problem, various randomized incremental gradient methods have been proposed with guaranteed upper and lower complexity bounds for their convergence. Nonetheless, these lower bounds rely on certain conditions: deterministic optimization algorithm, or fixed probability distribution for the selection of component functions. Meanwhile, some lower bounds even do not match the upper bounds of the best known methods in certain cases. To break these limitations, we derive tight lower complexity bounds of randomized incremental gradient methods, including SAG, SAGA, SVRG, and SARAH, for two typical cases of finite-sum optimization. Specifically, our results tightly match the upper complexity of Katyusha or VRADA when each component function is strongly convex and smooth, and tightly match the upper complexity of SDCA without duality and of KatyushaX when the finite-sum function is strongly convex and the component functions are average smooth. △ Less

Submitted 19 June, 2022; v1 submitted 17 October, 2020; originally announced October 2020.

arXiv:2009.03449 [pdf, other]

Survival Analysis via Ordinary Differential Equations

Authors: Wei**g Tang, Kevin He, Gongjun Xu, Ji Zhu

Abstract: This paper introduces an Ordinary Differential Equation (ODE) notion for survival analysis. The ODE notion not only provides a unified modeling framework, but more importantly, also enables the development of a widely applicable, scalable, and easy-to-implement procedure for estimation and inference. Specifically, the ODE modeling framework unifies many existing survival models, such as the propor… ▽ More This paper introduces an Ordinary Differential Equation (ODE) notion for survival analysis. The ODE notion not only provides a unified modeling framework, but more importantly, also enables the development of a widely applicable, scalable, and easy-to-implement procedure for estimation and inference. Specifically, the ODE modeling framework unifies many existing survival models, such as the proportional hazards model, the linear transformation model, the accelerated failure time model, and the time-varying coefficient model as special cases. The generality of the proposed framework serves as the foundation of a widely applicable estimation procedure. As an illustrative example, we develop a sieve maximum likelihood estimator for a general semi-parametric class of ODE models. In comparison to existing estimation methods, the proposed procedure has advantages in terms of computational scalability and numerical stability. Moreover, to address unique theoretical challenges induced by the ODE notion, we establish a new general sieve M-theorem for bundled parameters and show that the proposed sieve estimator is consistent and asymptotically normal, and achieves the semi-parametric efficiency bound. The finite sample performance of the proposed estimator is examined in simulation studies and a real-world data example. △ Less

Submitted 5 December, 2021; v1 submitted 7 September, 2020; originally announced September 2020.

arXiv:2001.08540 [pdf, other]

Stochastic Item Descent Method for Large Scale Equal Circle Packing Problem

Authors: Kun He, Min Zhang, Jianrong Zhou, Yan **, Chu-min Li

Abstract: Stochastic gradient descent (SGD) is a powerful method for large-scale optimization problems in the area of machine learning, especially for a finite-sum formulation with numerous variables. In recent years, mini-batch SGD gains great success and has become a standard technique for training deep neural networks fed with big amount of data. Inspired by its success in deep learning, we apply the ide… ▽ More Stochastic gradient descent (SGD) is a powerful method for large-scale optimization problems in the area of machine learning, especially for a finite-sum formulation with numerous variables. In recent years, mini-batch SGD gains great success and has become a standard technique for training deep neural networks fed with big amount of data. Inspired by its success in deep learning, we apply the idea of SGD with batch selection of samples to a classic optimization problem in decision version. Given $n$ unit circles, the equal circle packing problem (ECPP) asks whether there exist a feasible packing that could put all the circles inside a circular container without overlap**. Specifically, we propose a stochastic item descent method (SIDM) for ECPP in large scale, which randomly divides the unit circles into batches and runs Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm on the corresponding batch function iteratively to speedup the calculation. We also increase the batch size during the batch iterations to gain higher quality solution. Comparing to the current best packing algorithms, SIDM greatly speeds up the calculation of optimization process and guarantees the solution quality for large scale instances with up to 1500 circle items, while the baseline algorithms usually handle about 300 circle items. The results indicate the highly efficiency of SIDM for this classic optimization problem in large scale, and show potential for other large scale classic optimization problems in which gradient descent is used for optimization. △ Less

Submitted 21 January, 2020; originally announced January 2020.

Comments: 7 pages, 2 figures, 3 tables

arXiv:1804.09829 [pdf]

Transform the Non-linear Programming Problem to the Initial-value Problem to Solve

Authors: Sheng Zhang, Fei Liao, Yi-Nan Kong, Kai-Feng He

Abstract: A dynamic method to solve the Non-linear Programming (NLP) problem with Equality Constraints (ECs) and Inequality Constraints (IECs) is proposed. Inspired by the Lyapunov continuous-time dynamics stability theory in the control field, the optimal solution is analogized to the stable equilibrium point of a finite-dimensional dynamic system and it is solved in an asymptotic manner. Under the premise… ▽ More A dynamic method to solve the Non-linear Programming (NLP) problem with Equality Constraints (ECs) and Inequality Constraints (IECs) is proposed. Inspired by the Lyapunov continuous-time dynamics stability theory in the control field, the optimal solution is analogized to the stable equilibrium point of a finite-dimensional dynamic system and it is solved in an asymptotic manner. Under the premise that the Karush-Kuhn-Tucker (KKT) optimality condition exists, the Dynamic Optimization Equation (DOE), which has the same dimension to that of the optimization parameter vector, is established and its solution will converge to the optimal solution of the NLP globally with a theoretical guarantee. Using the matrix pseudo-inverse, the DOE is valid even without the linearly independent regularity requirement on the nonlinear constraints. In addition, the analytic expressions of the Lagrange multipliers and KKT multipliers, which adjoin the ECs and the IECs respectively during the entire optimization process, are also derived. Via the proposed method, the NLP may be transformed to the Initial-value Problem (IVP) to be solved, with mature Ordinary Differential Equation (ODE) integration methods. Illustrative examples are solved and it is shown that the dynamic method developed may produce the right numerical solutions with high efficiency. △ Less

Submitted 1 October, 2021; v1 submitted 25 April, 2018; originally announced April 2018.

arXiv:1804.08222 [pdf, other]

Null-free False Discovery Rate Control Using Decoy Permutations

Authors: Kun He, Mengjie Li, Yan Fu, Fuzhou Gong, Xiaoming Sun

Abstract: The traditional approaches to false discovery rate (FDR) control in multiple hypothesis testing are usually based on the null distribution of a test statistic. However, all types of null distributions, including the theoretical, permutation-based and empirical ones, have some inherent drawbacks. For example, the theoretical null might fail because of improper assumptions on the sample distribution… ▽ More The traditional approaches to false discovery rate (FDR) control in multiple hypothesis testing are usually based on the null distribution of a test statistic. However, all types of null distributions, including the theoretical, permutation-based and empirical ones, have some inherent drawbacks. For example, the theoretical null might fail because of improper assumptions on the sample distribution. Here, we propose a null distribution-free approach to FDR control for multiple hypothesis testing. This approach, named target-decoy procedure, simply builds on the ordering of tests by some statistic or score, the null distribution of which is not required to be known. Competitive decoy tests are constructed from permutations of original samples and are used to estimate the false target discoveries. We prove that this approach controls the FDR when the statistics are independent between different tests. Simulation demonstrates that it is more stable and powerful than two existing popular approaches. Evaluation is also made on a real dataset. △ Less

Submitted 12 April, 2021; v1 submitted 22 April, 2018; originally announced April 2018.

Comments: 23 pages

arXiv:1802.04663 [pdf]

The Third Evolution Equation for Optimal Control Computation

Authors: Sheng Zhang, Fei Liao, Kai-Feng He

Abstract: The Variation Evolving Method (VEM) that originates from the continuous-time dynamics stability theory seeks the optimal solutions with variation evolution principle. After establishing the first and the second evolution equations within its frame, the third evolution equation is developed. This equation only solves the control variables along the variation time to get the optimal solution, and it… ▽ More The Variation Evolving Method (VEM) that originates from the continuous-time dynamics stability theory seeks the optimal solutions with variation evolution principle. After establishing the first and the second evolution equations within its frame, the third evolution equation is developed. This equation only solves the control variables along the variation time to get the optimal solution, and its definite conditions may be arbitrary since the equation can eliminate possible infeasibilities. With this equation, the dimension of the resulting Initial-value Problem (IVP), transformed via the semi-discrete method, is greatly reduced. Therefore it might relieve the computation burden in seeking solutions. Illustrative examples are solved and it is shown that the proposed equation may produce more precise numerical solutions than the second evolution equation, and its computation time may be shorter for the dense discretization. △ Less

Submitted 11 February, 2018; originally announced February 2018.

Comments: Key words: Optimal control, dynamics stability, variation evolution, evolution partial differential equation, the third evolution equation, initial-value problem. arXiv admin note: substantial text overlap with arXiv:1801.10486, arXiv:1801.01383, arXiv:1801.07395, arXiv:1712.09702 and text overlap with arXiv:1802.02140, arXiv:1709.02242

arXiv:1709.05143 [pdf, other]

Variable Version Lovász Local Lemma: Beyond Shearer's Bound

Authors: Kun He, Liang Li, Xingwu Liu, Yuyi Wang, Mingji Xia

Abstract: A tight criterion under which the abstract version Lovász Local Lemma (abstract-LLL) holds was given by Shearer decades ago. However, little is known about that of the variable version LLL (variable-LLL) where events are generated by independent random variables, though this model of events is applicable to almost all applications of LLL. We introduce a necessary and sufficient criterion for varia… ▽ More A tight criterion under which the abstract version Lovász Local Lemma (abstract-LLL) holds was given by Shearer decades ago. However, little is known about that of the variable version LLL (variable-LLL) where events are generated by independent random variables, though this model of events is applicable to almost all applications of LLL. We introduce a necessary and sufficient criterion for variable-LLL, in terms of the probabilities of the events and the event-variable graph specifying the dependency among the events. Based on this new criterion, we obtain boundaries for two families of event-variable graphs, namely, cyclic and treelike bigraphs. These are the first two non-trivial cases where the variable-LLL boundary is fully determined. As a byproduct, we also provide a universal constructive method to find a set of events whose union has the maximum probability, given the probability vector and the event-variable graph. Though it is #P-hard in general to determine variable-LLL boundaries, we can to some extent decide whether a gap exists between a variable-LLL boundary and the corresponding abstract-LLL boundary. In particular, we show that the gap existence can be decided without solving Shearer's conditions or checking our variable-LLL criterion. Equipped with this powerful theorem, we show that there is no gap if the base graph of the event-variable graph is a tree, while gap appears if the base graph has an induced cycle of length at least 4. The problem is almost completely solved except when the base graph has only 3-cliques, in which case we also get partial solutions. A set of reduction rules are established that facilitate to infer gap existence of an event-variable graph from known ones. As an application, various event-variable graphs, in particular combinatorial ones, are shown to be gapful/gapless. △ Less

Submitted 15 September, 2017; originally announced September 2017.

Comments: Part of the work has been published at FOCS2017

arXiv:1609.06806 [pdf, ps, other]

doi 10.1002/sta4.123

Asymptotic properties of adaptive group Lasso for sparse reduced rank regression

Authors: Kejun He, Jianhua Z. Huang

Abstract: This paper studies the asymptotic properties of the penalized least squares estimator using an adaptive group Lasso penalty for the reduced rank regression. The group Lasso penalty is defined in the way that the regression coefficients corresponding to each predictor are treated as one group. It is shown that under certain regularity conditions, the estimator can achieve the minimax optimal rate o… ▽ More This paper studies the asymptotic properties of the penalized least squares estimator using an adaptive group Lasso penalty for the reduced rank regression. The group Lasso penalty is defined in the way that the regression coefficients corresponding to each predictor are treated as one group. It is shown that under certain regularity conditions, the estimator can achieve the minimax optimal rate of convergence. Moreover, the variable selection consistency can also be achieved, that is, the relevant predictors can be identified with probability approaching one. In the asymptotic theory, the number of response variables, the number of predictors, and the rank number are allowed to grow to infinity with the sample size. △ Less

Submitted 24 October, 2016; v1 submitted 21 September, 2016; originally announced September 2016.

Journal ref: Stat, 5:1, 251-261 (2016)

arXiv:1608.05002 [pdf, ps, other]

doi 10.1073/pnas.1618780114

Bayesian Posteriors For Arbitrarily Rare Events

Authors: Drew Fudenberg, Kevin He, Lorens Imhof

Abstract: We study how much data a Bayesian observer needs to correctly infer the relative likelihoods of two events when both events are arbitrarily rare. Each period, either a blue die or a red die is tossed. The two dice land on side $1$ with unknown probabilities $p_1$ and $q_1$, which can be arbitrarily low. Given a data-generating process where $p_1\ge c q_1$, we are interested in how much data is req… ▽ More We study how much data a Bayesian observer needs to correctly infer the relative likelihoods of two events when both events are arbitrarily rare. Each period, either a blue die or a red die is tossed. The two dice land on side $1$ with unknown probabilities $p_1$ and $q_1$, which can be arbitrarily low. Given a data-generating process where $p_1\ge c q_1$, we are interested in how much data is required to guarantee that with high probability the observer's Bayesian posterior mean for $p_1$ exceeds $(1-δ)c$ times that for $q_1$. If the prior densities for the two dice are positive on the interior of the parameter space and behave like power functions at the boundary, then for every $ε>0,$ there exists a finite $N$ so that the observer obtains such an inference after $n$ periods with probability at least $1-ε$ whenever $np_1\ge N$. The condition on $n$ and $p_1$ is the best possible. The result can fail if one of the prior densities converges to zero exponentially fast at the boundary. △ Less

Submitted 22 April, 2017; v1 submitted 17 August, 2016; originally announced August 2016.

Journal ref: Proceedings of the National Academy of Sciences 114(19):4925-4929, May 2017

arXiv:1511.08015 [pdf, ps, other]

The G-convex Functions Based on the Nonlinear Expectations Defined by G-BSDEs

Authors: Kun He

Abstract: In this paper, generalizing the definition of G-convex functions defined by Peng [9] during the construction of G-expectations and related properties, we define a group of G-convex functions based on the Backward Stochastic Differential Equations driven by G- Brownian motions. In this paper, generalizing the definition of G-convex functions defined by Peng [9] during the construction of G-expectations and related properties, we define a group of G-convex functions based on the Backward Stochastic Differential Equations driven by G- Brownian motions. △ Less

Submitted 25 November, 2015; originally announced November 2015.

Comments: 11 pages. arXiv admin note: text overlap with arXiv:1306.1929 by other authors

MSC Class: 60H10; 60H30

arXiv:1501.00537 [pdf]

A theoretical foundation of the target-decoy search strategy for false discovery rate control in proteomics

Authors: Kun He, Yan Fu, Wen-Feng Zeng, Lan Luo, Hao Chi, Chao Liu, Lai-Yun Qing, Rui-Xiang Sun, Si-Min He

Abstract: Motivation: Target-decoy search (TDS) is currently the most popular strategy for estimating and controlling the false discovery rate (FDR) of peptide identifications in mass spectrometry-based shotgun proteomics. While this strategy is very useful in practice and has been intensively studied empirically, its theoretical foundation has not yet been well established. Result: In this work, we systema… ▽ More Motivation: Target-decoy search (TDS) is currently the most popular strategy for estimating and controlling the false discovery rate (FDR) of peptide identifications in mass spectrometry-based shotgun proteomics. While this strategy is very useful in practice and has been intensively studied empirically, its theoretical foundation has not yet been well established. Result: In this work, we systematically analyze the TDS strategy in a rigorous statistical sense. We prove that the commonly used concatenated TDS provides a conservative estimate of the FDR for any given score threshold, but it cannot rigorously control the FDR. We prove that with a slight modification to the commonly used formula for FDR estimation, the peptide-level FDR can be rigorously controlled based on the concatenated TDS. We show that the spectrum-level FDR control is difficult. We verify the theoretical conclusions with real mass spectrometry data. △ Less

Submitted 3 January, 2015; originally announced January 2015.

Comments: 7 pages, 2 figures

arXiv:1411.5891 [pdf, ps, other]

Non-linear maps on self-adjoint operators preserving numerical radius and numerical range of Lie product

Authors: Kan He, **chuan Hou

Abstract: Let $H$ be a complex separable Hilbert space of dimension $\geq 2$, ${\mathcal B}_s(H)$ the space of all self-adjoint operators on $H$. We give a complete classification of non-linear surjective maps on $\mathcal B_s(H)$ preserving respectively numerical radius and numerical range of Lie product. Let $H$ be a complex separable Hilbert space of dimension $\geq 2$, ${\mathcal B}_s(H)$ the space of all self-adjoint operators on $H$. We give a complete classification of non-linear surjective maps on $\mathcal B_s(H)$ preserving respectively numerical radius and numerical range of Lie product. △ Less

Submitted 21 November, 2014; originally announced November 2014.

Comments: 22 pages

MSC Class: 47H20; 47B49; 47A12

arXiv:1304.4316 [pdf, ps, other]

Localization of Wiener Functionals of Fractional Regularity and Applications

Authors: Kai He, Jiagang Ren, Hua Zhang

Abstract: In this paper we localize some of Watanabe's results on fractional Wiener functionals, and use them to give a precise estimate of the difference between two Donsker's delta functionals even with fractional differentiability. As an application, the convergence rate of the density of the Euler scheme for non-Markovian stochastic differential equations is obtained. In this paper we localize some of Watanabe's results on fractional Wiener functionals, and use them to give a precise estimate of the difference between two Donsker's delta functionals even with fractional differentiability. As an application, the convergence rate of the density of the Euler scheme for non-Markovian stochastic differential equations is obtained. △ Less

Submitted 27 March, 2014; v1 submitted 15 April, 2013; originally announced April 2013.

arXiv:1210.0433 [pdf, ps, other]

doi 10.1016/j.jfa.2012.11.005

A geometric characterization of invertible quantum measurement maps

Authors: Kan He, **-Chuan Hou, Chi-Kwong Li

Abstract: A geometric characterization is given for invertible quantum measurement maps. Denote by ${\mathcal S}(H)$ the convex set of all states (i.e., trace-1 positive operators) on Hilbert space $H$ with dim$H\leq \infty$, and $[ρ_1, ρ_2]$ the line segment joining two elements $ρ_1, ρ_2$ in ${\mathcal S}(H)$. It is shown that a bijective map $φ:{\mathcal S}(H) \rightarrow {\mathcal S}(H)$ satisfies… ▽ More A geometric characterization is given for invertible quantum measurement maps. Denote by ${\mathcal S}(H)$ the convex set of all states (i.e., trace-1 positive operators) on Hilbert space $H$ with dim$H\leq \infty$, and $[ρ_1, ρ_2]$ the line segment joining two elements $ρ_1, ρ_2$ in ${\mathcal S}(H)$. It is shown that a bijective map $φ:{\mathcal S}(H) \rightarrow {\mathcal S}(H)$ satisfies $φ([ρ_1, ρ_2]) \subseteq [φ(ρ_1),φ(ρ_2)]$ for any $ρ_1, ρ_2 \in {\mathcal S}$ if and only if $φ$ has one of the following forms $$ρ\mapsto \frac{MρM^*}{{\rm tr}(MρM^*)}\quad \hbox{or} \quad ρ\mapsto \frac{Mρ^T M^*}{{\rm tr}(Mρ^T M^*)},$$ where $M$ is an invertible bounded linear operator and $ρ^T$ is the transpose of $ρ$ with respect to an arbitrarily fixed orthonormal basis. △ Less

Submitted 1 October, 2012; originally announced October 2012.

Comments: 14 pages

MSC Class: 47B49; 47L07; 47N50

Showing 1–28 of 28 results for author: He, K