-
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Authors:
Hao-Jun Michael Shi,
Tsung-Hsien Lee,
Shintaro Iwasaki,
Jose Gallego-Posada,
Zhi**g Li,
Kaushik Rangadurai,
Dheevatsa Mudigere,
Michael Rabbat
Abstract:
Shampoo is an online and stochastic optimization algorithm belonging to the AdaGrad family of methods for training neural networks. It constructs a block-diagonal preconditioner where each block consists of a coarse Kronecker product approximation to full-matrix AdaGrad for each parameter of the neural network. In this work, we provide a complete description of the algorithm as well as the perform…
▽ More
Shampoo is an online and stochastic optimization algorithm belonging to the AdaGrad family of methods for training neural networks. It constructs a block-diagonal preconditioner where each block consists of a coarse Kronecker product approximation to full-matrix AdaGrad for each parameter of the neural network. In this work, we provide a complete description of the algorithm as well as the performance optimizations that our implementation leverages to train deep networks at-scale in PyTorch. Our implementation enables fast multi-GPU distributed data-parallel training by distributing the memory and computation associated with blocks of each parameter via PyTorch's DTensor data structure and performing an AllGather primitive on the computed search directions at each iteration. This major performance enhancement enables us to achieve at most a 10% performance reduction in per-step wall-clock time compared against standard diagonal-scaling-based adaptive gradient methods. We validate our implementation by performing an ablation study on training ImageNet ResNet50, demonstrating Shampoo's superiority over standard training recipes with minimal hyperparameter tuning.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
A new method for solving the equation $x^d+(x+1)^d=b$ in $\mathbb{F}_{q^4}$ where $d=q^3+q^2+q-1$
Authors:
Liqin Qian,
Minjia Shi,
Wei Lu
Abstract:
In this paper, we give a new method answer to a recent conjecture proposed by Budaghyan, Calderini, Carlet, Davidova and Kaleyski about the equation $x^d+(x+1)^d=b$ in $\mathbb{F}_{q^4}$, where $n$ is a positive integer, $q=2^n$ and $d=q^3+q^2+q-1$. In particular, we directly determine the differential spectrum of this power function $x^d$ using methods different from those in the literature. Comp…
▽ More
In this paper, we give a new method answer to a recent conjecture proposed by Budaghyan, Calderini, Carlet, Davidova and Kaleyski about the equation $x^d+(x+1)^d=b$ in $\mathbb{F}_{q^4}$, where $n$ is a positive integer, $q=2^n$ and $d=q^3+q^2+q-1$. In particular, we directly determine the differential spectrum of this power function $x^d$ using methods different from those in the literature. Compared with the methods in the literature, our method is more direct and simple.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Characterization of Plotkin-optimal two-weight codes over finite chain rings and related applications
Authors:
Shitao Li,
Minjia Shi
Abstract:
Few-weight codes over finite chain rings are associated with combinatorial objects such as strongly regular graphs (SRGs), strongly walk-regular graphs (SWRGs) and finite geometries, and are also widely used in data storage systems and secret sharing schemes. The first objective of this paper is to characterize all possible parameters of Plotkin-optimal two-homogeneous weight regular projective co…
▽ More
Few-weight codes over finite chain rings are associated with combinatorial objects such as strongly regular graphs (SRGs), strongly walk-regular graphs (SWRGs) and finite geometries, and are also widely used in data storage systems and secret sharing schemes. The first objective of this paper is to characterize all possible parameters of Plotkin-optimal two-homogeneous weight regular projective codes over finite chain rings, as well as their weight distributions. We show the existence of codes with these parameters by constructing an infinite family of two-homogeneous weight codes. The parameters of their Gray images have the same weight distribution as that of the two-weight codes of type SU1 in the sense of Calderbank and Kantor (Bull Lond Math Soc 18: 97-122, 1986). Further, we also construct three-homogeneous weight regular projective codes over finite chain rings combined with some known results. Finally, we study applications of our constructed codes in secret sharing schemes and graph theory. In particular, infinite families of SRGs and SWRGs with non-trivial parameters are obtained.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
Quasi-cyclic perfect codes in Doob graphs and special partitions of Galois rings
Authors:
Minjia Shi,
Xiaoxiao Li,
Denis S. Krotov,
Ferruh Özbudak
Abstract:
The Galois ring GR$(4^Δ)$ is the residue ring $Z_4[x]/(h(x))$, where $h(x)$ is a basic primitive polynomial of degree $Δ$ over $Z_4$. For any odd $Δ$ larger than $1$, we construct a partition of GR$(4^Δ) \backslash \{0\}$ into $6$-subsets of type $\{a,b,-a-b,-a,-b,a+b\}$ and $3$-subsets of type $\{c,-c,2c\}$ such that the partition is invariant under the multiplication by a nonzero element of the…
▽ More
The Galois ring GR$(4^Δ)$ is the residue ring $Z_4[x]/(h(x))$, where $h(x)$ is a basic primitive polynomial of degree $Δ$ over $Z_4$. For any odd $Δ$ larger than $1$, we construct a partition of GR$(4^Δ) \backslash \{0\}$ into $6$-subsets of type $\{a,b,-a-b,-a,-b,a+b\}$ and $3$-subsets of type $\{c,-c,2c\}$ such that the partition is invariant under the multiplication by a nonzero element of the Teichmuller set in GR$(4^Δ)$ and, if $Δ$ is not a multiple of $3$, under the action of the automorphism group of GR$(4^Δ)$.
As a corollary, this implies the existence of quasi-cyclic additive $1$-perfect codes of index $(2^Δ-1)$ in $D((2^Δ-1)(2^Δ-2)/{6}, 2^Δ-1 )$ where $D(m,n)$ is the Doob metric scheme on $Z^{2m+n}$.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
An Efficient Method for Joint Delay-Doppler Estimation of Moving Targets in Passive Radar
Authors:
Mengjiao Shi,
Yunhai Xiao,
Peili Li
Abstract:
Passive radar systems can detect and track the moving targets of interest by exploiting non-cooperative illuminators-of-opportunity to transmit orthogonal frequency division multiplexing (OFDM) signals. These targets are searched using a bank of correlators tuned to the waveform corresponding to the given Doppler frequency shift and delay. In this paper, we study the problem of joint delay-Doppler…
▽ More
Passive radar systems can detect and track the moving targets of interest by exploiting non-cooperative illuminators-of-opportunity to transmit orthogonal frequency division multiplexing (OFDM) signals. These targets are searched using a bank of correlators tuned to the waveform corresponding to the given Doppler frequency shift and delay. In this paper, we study the problem of joint delay-Doppler estimation of moving targets in OFDM passive radar. This task of estimation is described as an atomic-norm regularized convex optimization problem, or equivalently, a semi-definite programming problem. The alternating direction method of multipliers (ADMM) can be employed which computes each variable in a Gauss-Seidel manner, but its convergence is lack of certificate. In this paper, we use a symmetric Gauss-Seidel (sGS) to the framework of ADMM, which only needs to compute some of the subproblems twice but has the ability to ensure convergence. We do some simulated experiments which illustrate that the sGS-ADMM is superior to ADMM in terms of accuracy and computing time.
△ Less
Submitted 17 February, 2023;
originally announced February 2023.
-
A family of diameter perfect constant-weight codes from Steiner systems
Authors:
Minjia Shi,
Yuhong Xia,
Denis S. Krotov
Abstract:
If $S$ is a transitive metric space, then $|C|\cdot|A| \le |S|$ for any distance-$d$ code $C$ and a set $A$, ``anticode'', of diameter less than $d$. For every Steiner S$(t,k,n)$ system $S$, we show the existence of a $q$-ary constant-weight code $C$ of length~$n$, weight~$k$ (or $n-k$), and distance $d=2k-t+1$ (respectively, $d=n-t+1$) and an anticode $A$ of diameter $d-1$ such that the pair…
▽ More
If $S$ is a transitive metric space, then $|C|\cdot|A| \le |S|$ for any distance-$d$ code $C$ and a set $A$, ``anticode'', of diameter less than $d$. For every Steiner S$(t,k,n)$ system $S$, we show the existence of a $q$-ary constant-weight code $C$ of length~$n$, weight~$k$ (or $n-k$), and distance $d=2k-t+1$ (respectively, $d=n-t+1$) and an anticode $A$ of diameter $d-1$ such that the pair $(C,A)$ attains the code--anticode bound and the supports of the codewords of $C$ are the blocks of $S$ (respectively, the complements of the blocks of $S$). We study the problem of estimating the minimum value of $q$ for which such a code exists, and find that minimum for small values of $t$.
Keywords: diameter perfect codes, anticodes, constant-weight codes, code--anticode bound, Steiner systems.
△ Less
Submitted 31 July, 2023; v1 submitted 30 November, 2022;
originally announced December 2022.
-
Constructing MRD codes by switching
Authors:
Minjia Shi,
Denis S. Krotov,
Ferruh Özbudak
Abstract:
MRD codes are maximum codes in the rank-distance metric space on $m$-by-$n$ matrices over the finite field of order $q$. They are diameter perfect and have the cardinality $q^{m(n-d+1)}$ if $m\ge n$. We define switching in MRD codes as replacing special MRD subcodes by other subcodes with the same parameters. We consider constructions of MRD codes admitting such switching, including punctured twis…
▽ More
MRD codes are maximum codes in the rank-distance metric space on $m$-by-$n$ matrices over the finite field of order $q$. They are diameter perfect and have the cardinality $q^{m(n-d+1)}$ if $m\ge n$. We define switching in MRD codes as replacing special MRD subcodes by other subcodes with the same parameters. We consider constructions of MRD codes admitting such switching, including punctured twisted Gabidulin codes and direct-product codes. Using switching, we construct a huge class of MRD codes whose cardinality grows doubly exponentially in $m$ if the other parameters ($n$, $q$, the code distance) are fixed. Moreover, we construct MRD codes with different affine ranks and aperiodic MRD codes.
Keywords: MRD codes, rank distance, bilinear forms graph, switching, diameter perfect codes
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
Decompositions of Local mixed Morrey-type spaces and Application
Authors:
Mingwei Shi,
Jiang Zhou
Abstract:
In this paper, we obtain predual spaces of local mixed Morrey-type spaces, characterize mixed Hardy local Morrey-type spaces. Further also, investigate nonsmooth decomposition of local mixed Morrey-type spaces. As an application, we consider the Hardy operators on local mixed Morrey-type spaces.
In this paper, we obtain predual spaces of local mixed Morrey-type spaces, characterize mixed Hardy local Morrey-type spaces. Further also, investigate nonsmooth decomposition of local mixed Morrey-type spaces. As an application, we consider the Hardy operators on local mixed Morrey-type spaces.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
The local Morrey-type space Associated with Ball Quasi-Banach Function Spaces and Application
Authors:
Mingwei Shi,
Jiang Zhou
Abstract:
In this paper, we define for the first time the local Morrey-type space associated with ball quasi-Banach function spaces and show the related series of properties. In addition, Hardy-Littlewood maximal operator's boundedness is proved. We investigate nonsmooth decomposition of the local Morrey-type space associated with ball quasi-Banach function spaces via the Hardy local Morrey-type spaces asso…
▽ More
In this paper, we define for the first time the local Morrey-type space associated with ball quasi-Banach function spaces and show the related series of properties. In addition, Hardy-Littlewood maximal operator's boundedness is proved. We investigate nonsmooth decomposition of the local Morrey-type space associated with ball quasi-Banach function spaces via the Hardy local Morrey-type spaces associated with ball quasi-Banach function spaces. And we consider Hardy operator's boundedness.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
Homogeneous mixed Herz-Morrey spaces and its Applications
Authors:
Mingwei Shi,
Jiang Zhou
Abstract:
In this paper, we introduce homogeneous mixed Herz-Morrey spaces $M\dot{K}_{p,\vec{q}}^{α,λ}(\mathbb{R}^n)$ and show it's some properties. Firstly, the boundedness of sublinear operators, fractional type operators in homogeneous mixed Herz-Morrey spaces is investigated. In particular, the above results are still valid for Calder$\acute{o}$n-Zygmund operators and fractional maximal operators. Lastl…
▽ More
In this paper, we introduce homogeneous mixed Herz-Morrey spaces $M\dot{K}_{p,\vec{q}}^{α,λ}(\mathbb{R}^n)$ and show it's some properties. Firstly, the boundedness of sublinear operators, fractional type operators in homogeneous mixed Herz-Morrey spaces is investigated. In particular, the above results are still valid for Calder$\acute{o}$n-Zygmund operators and fractional maximal operators. Lastly, the boundedness of their commutators in homogeneous mixed Herz-Morrey spaces is obtained.
△ Less
Submitted 3 July, 2022;
originally announced July 2022.
-
On the coset graph construction of distance-regular graphs
Authors:
Minjia Shi,
Denis S. Krotov,
Patrick Solé
Abstract:
We show that no more new distance-regular graphs in the tables of the book of (Brouwer, Cohen, Neumaier, 1989) can be produced by using the coset graph of additive completely regular codes over finite fields.
We show that no more new distance-regular graphs in the tables of the book of (Brouwer, Cohen, Neumaier, 1989) can be produced by using the coset graph of additive completely regular codes over finite fields.
△ Less
Submitted 31 May, 2022;
originally announced June 2022.
-
Self-dual Hadamard bent sequences
Authors:
Minjia Shi,
Yaya Li,
Wei Cheng,
Dean Crnković,
Denis Krotov,
Patrick Solé
Abstract:
A new notion of bent sequence related to Hadamard matrices was introduced recently, motivated by a security application ( Solé et al, 2021). We study the self dual class in length at most $196.$ We use three competing methods of generation: Exhaustion, Linear Algebra and Groebner bases. Regular Hadamard matrices and Bush-type Hadamard matrices provide many examples. We conjecture that if $v$ is an…
▽ More
A new notion of bent sequence related to Hadamard matrices was introduced recently, motivated by a security application ( Solé et al, 2021). We study the self dual class in length at most $196.$ We use three competing methods of generation: Exhaustion, Linear Algebra and Groebner bases. Regular Hadamard matrices and Bush-type Hadamard matrices provide many examples. We conjecture that if $v$ is an even perfect square, a self-dual bent sequence of length $v$ always exist. We introduce the strong automorphism group of Hadamard matrices, which acts on their associated self-dual bent sequences. We give an efficient algorithm to compute that group.
△ Less
Submitted 22 June, 2022; v1 submitted 30 March, 2022;
originally announced March 2022.
-
The b-symbol weight hierarchy of the Kasami codes
Authors:
Hongwei Zhu,
Minjia Shi
Abstract:
The symbol-pair read channel was first proposed by Cassuto and Blaum. Later, Yaakobi et al. generalized it to the $b$-symbol read channel. It is motivated by the limitations of the reading process in high density data storage systems. One main task in $b$-symbol coding theory is to determine the $b$-symbol weight hierarchy of codes. In this paper, we study the $b$-symbol weight hierarchy of the Ka…
▽ More
The symbol-pair read channel was first proposed by Cassuto and Blaum. Later, Yaakobi et al. generalized it to the $b$-symbol read channel. It is motivated by the limitations of the reading process in high density data storage systems. One main task in $b$-symbol coding theory is to determine the $b$-symbol weight hierarchy of codes. In this paper, we study the $b$-symbol weight hierarchy of the Kasami codes, which are well known for their applications to construct sequences with optimal correlation magnitudes. The complete symbol-pair weight distribution of the Kasami codes is determined.
△ Less
Submitted 7 December, 2021;
originally announced December 2021.
-
Adaptive Finite-Difference Interval Estimation for Noisy Derivative-Free Optimization
Authors:
Hao-Jun Michael Shi,
Yuchen Xie,
Melody Qiming Xuan,
Jorge Nocedal
Abstract:
A common approach for minimizing a smooth nonlinear function is to employ finite-difference approximations to the gradient. While this can be easily performed when no error is present within the function evaluations, when the function is noisy, the optimal choice requires information about the noise level and higher-order derivatives of the function, which is often unavailable. Given the noise lev…
▽ More
A common approach for minimizing a smooth nonlinear function is to employ finite-difference approximations to the gradient. While this can be easily performed when no error is present within the function evaluations, when the function is noisy, the optimal choice requires information about the noise level and higher-order derivatives of the function, which is often unavailable. Given the noise level of the function, we propose a bisection search for finding a finite-difference interval for any finite-difference scheme that balances the truncation error, which arises from the error in the Taylor series approximation, and the measurement error, which results from noise in the function evaluation. Our procedure produces reliable estimates of the finite-difference interval at low cost without explicitly approximating higher-order derivatives. We show its numerical reliability and accuracy on a set of test problems. When combined with L-BFGS, we obtain a robust method for minimizing noisy black-box functions, as illustrated on a subset of unconstrained CUTEst problems with synthetically added noise.
△ Less
Submitted 22 March, 2022; v1 submitted 12 October, 2021;
originally announced October 2021.
-
An enumeration of 1-perfect ternary codes
Authors:
Minjia Shi,
Denis S. Krotov
Abstract:
We study codes with parameters of the ternary Hamming $(n=(3^m-1)/2,3^{n-m},3)$ code, i.e., ternary $1$-perfect codes. The rank of the code is defined to be the dimension of its affine span. We characterize ternary $1$-perfect codes of rank $n-m+1$, count their number, and prove that all such codes can be obtained from each other by a sequence of two-coordinate switchings. We enumerate ternary…
▽ More
We study codes with parameters of the ternary Hamming $(n=(3^m-1)/2,3^{n-m},3)$ code, i.e., ternary $1$-perfect codes. The rank of the code is defined to be the dimension of its affine span. We characterize ternary $1$-perfect codes of rank $n-m+1$, count their number, and prove that all such codes can be obtained from each other by a sequence of two-coordinate switchings. We enumerate ternary $1$-perfect codes of length $13$ obtained by concatenation from codes of lengths $9$ and $4$; we find that there are $93241327$ equivalence classes of such codes.
Keywords: perfect codes, ternary codes, concatenation, switching.
△ Less
Submitted 8 April, 2023; v1 submitted 12 October, 2021;
originally announced October 2021.
-
On $q$-ary shortened-$1$-perfect-like codes
Authors:
Minjia Shi,
Rongsheng Wu,
Denis S. Krotov
Abstract:
We study codes with parameters of $q$-ary shortened Hamming codes, i.e., $(n=(q^m-q)/(q-1), q^{n-m}, 3)_q$. Firstly, we prove the fact mentioned in 1998 by Brouwer et al. that such codes are optimal, generalizing it to a bound for multifold packings of radius-$1$ balls, with a corollary for multiple coverings. In particular, we show that the punctured Hamming code is an optimal $q$-fold packing wi…
▽ More
We study codes with parameters of $q$-ary shortened Hamming codes, i.e., $(n=(q^m-q)/(q-1), q^{n-m}, 3)_q$. Firstly, we prove the fact mentioned in 1998 by Brouwer et al. that such codes are optimal, generalizing it to a bound for multifold packings of radius-$1$ balls, with a corollary for multiple coverings. In particular, we show that the punctured Hamming code is an optimal $q$-fold packing with minimum distance $2$. Secondly, for every admissible length starting from $n=20$, we show the existence of $4$-ary codes with parameters of shortened $1$-perfect codes that cannot be obtained by shortening a $1$-perfect code.
Keywords: Hamming graph, multifold packings, multiple coverings, perfect codes.
△ Less
Submitted 28 June, 2023; v1 submitted 11 October, 2021;
originally announced October 2021.
-
The $q$-ary antiprimitive BCH codes
Authors:
Hongwei Zhu,
Minjia Shi,
Xiaoqiang Wang,
Tor Helleseth
Abstract:
It is well-known that cyclic codes have efficient encoding and decoding algorithms. In recent years, antiprimitive BCH codes have attracted a lot of attention. The objective of this paper is to study BCH codes of this type over finite fields and analyse their parameters. Some lower bounds on the minimum distance of antiprimitive BCH codes are given. The BCH codes presented in this paper have good…
▽ More
It is well-known that cyclic codes have efficient encoding and decoding algorithms. In recent years, antiprimitive BCH codes have attracted a lot of attention. The objective of this paper is to study BCH codes of this type over finite fields and analyse their parameters. Some lower bounds on the minimum distance of antiprimitive BCH codes are given. The BCH codes presented in this paper have good parameters in general, containing many optimal linear codes. In particular, two open problems about the minimum distance of BCH codes of this type are partially solved in this paper.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
Characterization of the boundedness of generalized fractional integral and maximal operators on Orlicz-Morrey and weak Orlicz-Morrey spaces
Authors:
Ryota Kawasumi,
Eiichi Nakai,
Minglei Shi
Abstract:
We give necessary and sufficient conditions for the boundedness of generalized fractional integral and maximal operators on Orlicz-Morrey and weak Orlicz-Morrey spaces. To do this we prove the weak-weak type modular inequality of the Hardy-Littlewood maximal operator with respect to the Young function. Orlicz-Morrey spaces contain $L^p$ spaces ($1\le p\le\infty$), Orlicz spaces and generalized Mor…
▽ More
We give necessary and sufficient conditions for the boundedness of generalized fractional integral and maximal operators on Orlicz-Morrey and weak Orlicz-Morrey spaces. To do this we prove the weak-weak type modular inequality of the Hardy-Littlewood maximal operator with respect to the Young function. Orlicz-Morrey spaces contain $L^p$ spaces ($1\le p\le\infty$), Orlicz spaces and generalized Morrey spaces as special cases. Hence we get necessary and sufficient conditions on these function spaces as corollaries.
△ Less
Submitted 22 July, 2021;
originally announced July 2021.
-
Designs, permutations, and transitive groups
Authors:
Minjia Shi,
XiaoXiao Li,
Patrick Solé
Abstract:
A notion of $t$-designs in the symmetric group on $n$ letters was introduced by Godsil in 1988. In particular $t$-transitive sets of permutations form a $t$-design. We derive special lower bounds for $t=1$ and $t=2$ by a power moment method. For general $n,t$ we give a %linear programming lower bound . For $n\ge 4$ and $t=2,$ this bound is strong enough to show a lower bound on the size of such…
▽ More
A notion of $t$-designs in the symmetric group on $n$ letters was introduced by Godsil in 1988. In particular $t$-transitive sets of permutations form a $t$-design. We derive special lower bounds for $t=1$ and $t=2$ by a power moment method. For general $n,t$ we give a %linear programming lower bound . For $n\ge 4$ and $t=2,$ this bound is strong enough to show a lower bound on the size of such $t$-designs of $n(n-1)\dots (n-t+1),$ which is best possible when sharply $t$-transitive sets of permutations exist. This shows, in particular, that tight $2$-designs do not exist.
△ Less
Submitted 24 June, 2023; v1 submitted 17 May, 2021;
originally announced May 2021.
-
Quantized State Feedback Stabilization of Nonlinear Systems under Denial-of-Service
Authors:
Mingming Shi,
Shuai Feng,
Hideaki Ishii
Abstract:
This paper studies the resilient control of networked systems in the presence of cyber attacks. In particular, we consider the state feedback stabilization problem for nonlinear systems when the state measurement is sent to the controller via a communication channel that only has a finite transmitting rate and is moreover subject to cyber attacks in the form of Denial-of-Service (DoS). We use a dy…
▽ More
This paper studies the resilient control of networked systems in the presence of cyber attacks. In particular, we consider the state feedback stabilization problem for nonlinear systems when the state measurement is sent to the controller via a communication channel that only has a finite transmitting rate and is moreover subject to cyber attacks in the form of Denial-of-Service (DoS). We use a dynamic quantization method to update the quantization range of the encoder/decoder and characterize the number of bits for quantization needed to stabilize the system under a given level of DoS attacks in terms of duration and frequency. Our theoretical result shows that under DoS attacks, the required data bits to stabilize nonlinear systems by state feedback control are larger than those without DoS since the communication interruption induced by DoS makes the quantization uncertainty expand more between two successful transmissions. Even so, in the simulation, we show that the actual quantization bits can be much smaller than the theoretical value.
△ Less
Submitted 9 April, 2021;
originally announced April 2021.
-
Are energy savings the only reason for the emergence of bird echelon formation?
Authors:
Mingming Shi,
Julien M. Hendrickx
Abstract:
We analyze the conditions under which the emergence of frequently observed echelon formation can be explained solely by the maximization of energy savings. We consider a two-dimensional multi-agent echelon formation, where each agent receives a benefit that depends on its position relative to the others, and adjusts its position to increase this benefit. We analyze the selfish case where each agen…
▽ More
We analyze the conditions under which the emergence of frequently observed echelon formation can be explained solely by the maximization of energy savings. We consider a two-dimensional multi-agent echelon formation, where each agent receives a benefit that depends on its position relative to the others, and adjusts its position to increase this benefit. We analyze the selfish case where each agent maximizes its own benefit, leading to a Nash-equilibrium problem, and the collaborative case in which agents maximize the global benefit of the group. We provide conditions on the benefit function under which the frequently observed echelon formations cannot be Nash equilbriums or group optimums.
We then show that these conditions are satisfied by the conventionally used fixed-wing wake benefit model. This implies that energy saving alone is not sufficient to explain the emergence of the migratory formations observed, based on the fixed-wing model. Hence, either non-aerodynamic aspects or a more accurate model of bird dynamics should be considered to construct such formations.
△ Less
Submitted 24 March, 2021;
originally announced March 2021.
-
LCD Codes from tridiagonal Toeplitz matrice
Authors:
Minjia Shi,
Ferruh Özbudak,
Li Xu,
Patrick Solé
Abstract:
Double Toeplitz (DT) codes are codes with a generator matrix of the form $(I,T)$ with $T$ a Toeplitz matrix, that is to say constant on the diagonals parallel to the main. When $T$ is tridiagonal and symmetric we determine its spectrum explicitly by using Dickson polynomials, and deduce from there conditions for the code to be LCD. Using a special concatenation process, we construct optimal or qua…
▽ More
Double Toeplitz (DT) codes are codes with a generator matrix of the form $(I,T)$ with $T$ a Toeplitz matrix, that is to say constant on the diagonals parallel to the main. When $T$ is tridiagonal and symmetric we determine its spectrum explicitly by using Dickson polynomials, and deduce from there conditions for the code to be LCD. Using a special concatenation process, we construct optimal or quasi-optimal examples of binary and ternary LCD codes from DT codes over extension fields.
△ Less
Submitted 7 March, 2021;
originally announced March 2021.
-
On the Numerical Performance of Derivative-Free Optimization Methods Based on Finite-Difference Approximations
Authors:
Hao-Jun Michael Shi,
Melody Qiming Xuan,
Figen Oztoprak,
Jorge Nocedal
Abstract:
The goal of this paper is to investigate an approach for derivative-free optimization that has not received sufficient attention in the literature and is yet one of the simplest to implement and parallelize. It consists of computing gradients of a smoothed approximation of the objective function (and constraints), and employing them within established codes. These gradient approximations are calcu…
▽ More
The goal of this paper is to investigate an approach for derivative-free optimization that has not received sufficient attention in the literature and is yet one of the simplest to implement and parallelize. It consists of computing gradients of a smoothed approximation of the objective function (and constraints), and employing them within established codes. These gradient approximations are calculated by finite differences, with a differencing interval determined by the noise level in the functions and a bound on the second or third derivatives. It is assumed that noise level is known or can be estimated by means of difference tables or sampling. The use of finite differences has been largely dismissed in the derivative-free optimization literature as too expensive in terms of function evaluations and/or as impractical when the objective function contains noise. The test results presented in this paper suggest that such views should be re-examined and that the finite-difference approach has much to be recommended. The tests compared NEWUOA, DFO-LS and COBYLA against the finite-difference approach on three classes of problems: general unconstrained problems, nonlinear least squares, and general nonlinear programs with equality constraints.
△ Less
Submitted 19 February, 2021;
originally announced February 2021.
-
Designs in finite metric spaces: a probabilistic approach
Authors:
Minjia Shi,
Olivier Rioul,
Patrick Solé
Abstract:
A finite metric space is called here distance degree regular if its distance degree sequence is the same for every vertex. A notion of designs in such spaces is introduced that generalizes that of designs in $Q$-polynomial distance-regular graphs. An approximation of their cumulative distribution function, based on the notion of Christoffel function in approximation theory is given. As an applicat…
▽ More
A finite metric space is called here distance degree regular if its distance degree sequence is the same for every vertex. A notion of designs in such spaces is introduced that generalizes that of designs in $Q$-polynomial distance-regular graphs. An approximation of their cumulative distribution function, based on the notion of Christoffel function in approximation theory is given. As an application we derive limit laws on the weight distributions of binary orthogonal arrays of strength going to infinity. An analogous result for combinatorial designs of strength going to infinity is given.
△ Less
Submitted 16 February, 2021;
originally announced February 2021.
-
Zero sum sets in abelian groups
Authors:
Minjia Shi,
Denis S. Krotov,
Xiaoxiao Li,
Patrick Solé
Abstract:
The distribution of cardinalities of zero-sum sets in abelian groups is completely determined. A complex summation involving the Möbius function is given for the general abelian group, while in many special cases, including the case of elementary abelian groups, solved earlier by Li and Wan, it has a compact form. The proof involves two different Möbius transforms, on positive integers and on set…
▽ More
The distribution of cardinalities of zero-sum sets in abelian groups is completely determined. A complex summation involving the Möbius function is given for the general abelian group, while in many special cases, including the case of elementary abelian groups, solved earlier by Li and Wan, it has a compact form. The proof involves two different Möbius transforms, on positive integers and on set partitions.
△ Less
Submitted 7 February, 2021; v1 submitted 29 January, 2021;
originally announced February 2021.
-
Limit theorems and ergodicity for general bootstrap random walks
Authors:
A. Collevecchio,
K. Hamza,
M. Shi,
R. J. Williams
Abstract:
Given the increments of a simple symmetric random walk $(X_n)_{n\ge0}$, we characterize all possible ways of recycling these increments into a simple symmetric random walk $(Y_n)_{n\ge0}$ adapted to the filtration of $(X_n)_{n\ge0}$. We study the long term behavior of a suitably normalized two-dimensional process $((X_n,Y_n))_{n\ge0}$. In particular, we provide necessary and sufficient conditions…
▽ More
Given the increments of a simple symmetric random walk $(X_n)_{n\ge0}$, we characterize all possible ways of recycling these increments into a simple symmetric random walk $(Y_n)_{n\ge0}$ adapted to the filtration of $(X_n)_{n\ge0}$. We study the long term behavior of a suitably normalized two-dimensional process $((X_n,Y_n))_{n\ge0}$. In particular, we provide necessary and sufficient conditions for the process to converge to a two-dimensional Brownian motion (possibly degenerate). We also discuss cases in which the limit is not Gaussian. Finally, we provide a simple necessary and sufficient condition for the ergodicity of the recycling transformation, thus generalizing results from Dubins and Smorodinsky (1992) and Fujita (2008), and solving the discrete version of the open problem of the ergodicity of the general Lévy transformation (see Mansuy and Yor, 2006).
△ Less
Submitted 30 June, 2021; v1 submitted 29 October, 2020;
originally announced October 2020.
-
A Noise-Tolerant Quasi-Newton Algorithm for Unconstrained Optimization
Authors:
Hao-Jun Michael Shi,
Yuchen Xie,
Richard Byrd,
Jorge Nocedal
Abstract:
This paper describes an extension of the BFGS and L-BFGS methods for the minimization of a nonlinear function subject to errors. This work is motivated by applications that contain computational noise, employ low-precision arithmetic, or are subject to statistical noise. The classical BFGS and L-BFGS methods can fail in such circumstances because the updating procedure can be corrupted and the lin…
▽ More
This paper describes an extension of the BFGS and L-BFGS methods for the minimization of a nonlinear function subject to errors. This work is motivated by applications that contain computational noise, employ low-precision arithmetic, or are subject to statistical noise. The classical BFGS and L-BFGS methods can fail in such circumstances because the updating procedure can be corrupted and the line search can behave erratically. The proposed method addresses these difficulties and ensures that the BFGS update is stable by employing a lengthening procedure that spaces out the points at which gradient differences are collected. A new line search, designed to tolerate errors, guarantees that the Armijo-Wolfe conditions are satisfied under most reasonable conditions, and works in conjunction with the lengthening procedure. The proposed methods are shown to enjoy convergence guarantees for strongly convex functions. Detailed implementations of the methods are presented, together with encouraging numerical results.
△ Less
Submitted 8 September, 2021; v1 submitted 8 October, 2020;
originally announced October 2020.
-
Search for a moving target in a competitive environment
Authors:
Benoit Duvocelle,
János Flesch,
Hui Min Shi,
Dries Vermeulen
Abstract:
We consider a discrete-time dynamic search game in which a number of players compete to find an invisible object that is moving according to a time-varying Markov chain. We examine the subgame perfect equilibria of these games. The main result of the paper is that the set of subgame perfect equilibria is exactly the set of greedy strategy profiles, i.e. those strategy profiles in which the players…
▽ More
We consider a discrete-time dynamic search game in which a number of players compete to find an invisible object that is moving according to a time-varying Markov chain. We examine the subgame perfect equilibria of these games. The main result of the paper is that the set of subgame perfect equilibria is exactly the set of greedy strategy profiles, i.e. those strategy profiles in which the players always choose an action that maximizes their probability of immediately finding the object. We discuss various variations and extensions of the model.
△ Less
Submitted 25 August, 2020; v1 submitted 21 August, 2020;
originally announced August 2020.
-
Commutators of integral operators with functions in Campanato spaces on Orlicz-Morrey spaces
Authors:
Minglei Shi,
Ryutaro Arai,
Eiichi Nakai
Abstract:
We consider the commutators $[b,T]$ and $[b,I_ρ]$ on Orlicz-Morrey spaces, where $T$ is a Calderón-Zygmund operator, $I_ρ$ is a generalized fractional integral operator and $b$ is a function in generalized Campanato spaces. We give a necessary and sufficient condition for the boundedness of the commutators on Orlicz-Morrey spaces. To do this we prove the boundedness of generalized fractional maxim…
▽ More
We consider the commutators $[b,T]$ and $[b,I_ρ]$ on Orlicz-Morrey spaces, where $T$ is a Calderón-Zygmund operator, $I_ρ$ is a generalized fractional integral operator and $b$ is a function in generalized Campanato spaces. We give a necessary and sufficient condition for the boundedness of the commutators on Orlicz-Morrey spaces. To do this we prove the boundedness of generalized fractional maximal operators on Orlicz-Morrey spaces. Moreover, we introduce Orlicz-Campanato spaces and establish their relations to Orlicz-Morrey spaces.
△ Less
Submitted 30 June, 2020;
originally announced July 2020.
-
On the number of frequency hypercubes $F^n(4;2,2)$
Authors:
Minjia Shi,
Shukai Wang,
Xiaoxiao Li,
Denis S. Krotov
Abstract:
A frequency $n$-cube $F^n(4;2,2)$ is an $n$-dimensional $4$-by-...-by-$4$ array filled by $0$s and $1$s such that each line contains exactly two $1$s. We classify the frequency $4$-cubes $F^4(4;2,2)$, find a testing set of size $25$ for $F^3(4;2,2)$, and derive an upper bound on the number of $F^n(4;2,2)$. Additionally, for any $n$ greater than $2$, we construct an $F^n(4;2,2)$ that cannot be refi…
▽ More
A frequency $n$-cube $F^n(4;2,2)$ is an $n$-dimensional $4$-by-...-by-$4$ array filled by $0$s and $1$s such that each line contains exactly two $1$s. We classify the frequency $4$-cubes $F^4(4;2,2)$, find a testing set of size $25$ for $F^3(4;2,2)$, and derive an upper bound on the number of $F^n(4;2,2)$. Additionally, for any $n$ greater than $2$, we construct an $F^n(4;2,2)$ that cannot be refined to a latin hypercube, while each of its sub-$F^{n-1}(4;2,2)$ can.
Keywords: frequency hypercube, frequency square, latin hypercube, testing set, MDS code
△ Less
Submitted 21 April, 2021; v1 submitted 21 May, 2020;
originally announced May 2020.
-
Three-weight codes over rings and strongly walk regular graphs
Authors:
Michael Kiermaier,
Sascha Kurz,
Minjia Shi,
Patrick Solé
Abstract:
We construct strongly walk-regular graphs as coset graphs of the duals of codes with three non-zero homogeneous weights over $\mathbb{Z}_{p^m},$ for $p$ a prime, and more generally over chain rings of depth $m$, and with a residue field of size $q$, a prime power. Infinite families of examples are built from Kerdock and generalized Teichmüller codes. As a byproduct, we give an alternative proof th…
▽ More
We construct strongly walk-regular graphs as coset graphs of the duals of codes with three non-zero homogeneous weights over $\mathbb{Z}_{p^m},$ for $p$ a prime, and more generally over chain rings of depth $m$, and with a residue field of size $q$, a prime power. Infinite families of examples are built from Kerdock and generalized Teichmüller codes. As a byproduct, we give an alternative proof that the Kerdock code is nonlinear.
△ Less
Submitted 9 December, 2019;
originally announced December 2019.
-
Solving dynamic multi-objective optimization problems via support vector machine
Authors:
Min Jiang,
Weizhen Hu,
Liming Qiu,
Minghui Shi,
Kay Chen Tan
Abstract:
Dynamic Multi-objective Optimization Problems (DMOPs) refer to optimization problems that objective functions will change with time. Solving DMOPs implies that the Pareto Optimal Set (POS) at different moments can be accurately found, and this is a very difficult job due to the dynamics of the optimization problems. The POS that have been obtained in the past can help us to find the POS of the nex…
▽ More
Dynamic Multi-objective Optimization Problems (DMOPs) refer to optimization problems that objective functions will change with time. Solving DMOPs implies that the Pareto Optimal Set (POS) at different moments can be accurately found, and this is a very difficult job due to the dynamics of the optimization problems. The POS that have been obtained in the past can help us to find the POS of the next time more quickly and accurately. Therefore, in this paper we present a Support Vector Machine (SVM) based Dynamic Multi-Objective Evolutionary optimization Algorithm, called SVM-DMOEA. The algorithm uses the POS that has been obtained to train a SVM and then take the trained SVM to classify the solutions of the dynamic optimization problem at the next moment, and thus it is able to generate an initial population which consists of different individuals recognized by the trained SVM. The initial populuation can be fed into any population based optimization algorithm, e.g., the Nondominated Sorting Genetic Algorithm II (NSGA-II), to get the POS at that moment. The experimental results show the validity of our proposed approach.
△ Less
Submitted 19 October, 2019;
originally announced October 2019.
-
On the number of resolvable Steiner triple systems of small 3-rank
Authors:
Minjia Shi,
Li Xu,
Denis S. Krotov
Abstract:
In a recent work, Jungnickel, Magliveras, Tonchev, and Wassermann derived an overexponential lower bound on the number of nonisomorphic resolvable Steiner triple systems (STS) of order $v$, where $v=3^k$, and $3$-rank $v-k$. We develop an approach to generalize this bound and estimate the number of isomorphism classes of STS$(v)$ of rank $v-k-1$ for an arbitrary $v$ of form $3^kT$.
In a recent work, Jungnickel, Magliveras, Tonchev, and Wassermann derived an overexponential lower bound on the number of nonisomorphic resolvable Steiner triple systems (STS) of order $v$, where $v=3^k$, and $3$-rank $v-k$. We develop an approach to generalize this bound and estimate the number of isomorphism classes of STS$(v)$ of rank $v-k-1$ for an arbitrary $v$ of form $3^kT$.
△ Less
Submitted 29 June, 2019;
originally announced July 2019.
-
Sufficient conditions for STS$(3^k)$ of 3-rank $\leq 3^k-r$ to be resolvable
Authors:
Yaqi Lu,
Minjia Shi
Abstract:
Based on the structure of non-full-$3$-rank $STS(3^k)$ and the orthogonal Latin squares, we mainly give sufficient conditions for $STS(3^k)$ of $3$-rank $\leq 3^k-r$ to be resolvable in the present paper. Under the conditions, the block set of $STS(3^k)$ can be partitioned into $\frac{3^k-1}{2}$ parallel classes, i.e., $\frac{3^k-1}{2}$ $1$-$(v,3,1)$ designs. Finally, we prove that $STS(3^k)$ of 3…
▽ More
Based on the structure of non-full-$3$-rank $STS(3^k)$ and the orthogonal Latin squares, we mainly give sufficient conditions for $STS(3^k)$ of $3$-rank $\leq 3^k-r$ to be resolvable in the present paper. Under the conditions, the block set of $STS(3^k)$ can be partitioned into $\frac{3^k-1}{2}$ parallel classes, i.e., $\frac{3^k-1}{2}$ $1$-$(v,3,1)$ designs. Finally, we prove that $STS(3^k)$ of 3-rank $\leq 3^k-r$ is resolvable under the sufficient conditions.
△ Less
Submitted 3 June, 2019;
originally announced June 2019.
-
The Steiner triple systems of order 21 with a transversal subdesign TD(3,6)
Authors:
Yue Guan,
Minjia Shi,
Denis S. Krotov
Abstract:
We prove several structural properties of Steiner triple systems (STS) of order 3w+3 that include one or more transversal subdesigns TD(3,w). Using an exhaustive search, we find that there are 2004720 isomorphism classes of STS(21) including a subdesign TD(3,6), or, equivalently, a 6-by-6 latin square.
We prove several structural properties of Steiner triple systems (STS) of order 3w+3 that include one or more transversal subdesigns TD(3,w). Using an exhaustive search, we find that there are 2004720 isomorphism classes of STS(21) including a subdesign TD(3,6), or, equivalently, a 6-by-6 latin square.
△ Less
Submitted 23 May, 2019; v1 submitted 22 May, 2019;
originally announced May 2019.
-
Bias estimation in sensor networks
Authors:
Mingming Shi,
Claudio De Persis,
Pietro Tesi,
Nima Monshizadeh
Abstract:
This paper investigates the problem of estimating biases affecting relative state measurements in a sensor network. Each sensor measures the relative states of its neighbors and this measurement is corrupted by a constant bias. We analyse under what conditions on the network topology and the maximum number of biased sensors the biases can be correctly estimated. We show that for non-bipartite grap…
▽ More
This paper investigates the problem of estimating biases affecting relative state measurements in a sensor network. Each sensor measures the relative states of its neighbors and this measurement is corrupted by a constant bias. We analyse under what conditions on the network topology and the maximum number of biased sensors the biases can be correctly estimated. We show that for non-bipartite graphs the biases can always be determined even when all the sensors are corrupted, while for bipartite graphs more than half of the sensors should be unbiased to ensure the correctness of the bias estimation. If the biases are heterogeneous, then the number of unbiased sensors can be reduced to two. Based on these conditions, we propose some algorithms to estimate the biases.
△ Less
Submitted 22 May, 2019;
originally announced May 2019.
-
Generalized fractional integral operators and their commutators with functions in generalized Campanato spaces on Orlicz spaces
Authors:
Minglei Shi,
Ryutaro Arai,
Eiichi Nakai
Abstract:
We investigate the commutators $[b,I_ρ]$ of generalized fractional integral operators $I_ρ$ with functions $b$ in generalized Campanato spaces and give a necessary and sufficient condition for the boundedness of the commutators on Orlicz spaces. To do this we define Orlicz spaces with generalized Young functions and prove the boundedness of generalized fractional maximal operators on the Orlicz sp…
▽ More
We investigate the commutators $[b,I_ρ]$ of generalized fractional integral operators $I_ρ$ with functions $b$ in generalized Campanato spaces and give a necessary and sufficient condition for the boundedness of the commutators on Orlicz spaces. To do this we define Orlicz spaces with generalized Young functions and prove the boundedness of generalized fractional maximal operators on the Orlicz spaces.
△ Less
Submitted 21 December, 2018;
originally announced December 2018.
-
Generalized fractional maximal and integral operators on Orlicz and generalized Orlicz--Morrey spaces of the third kind
Authors:
Fatih Deringoz,
Vagif S. Guliyev,
Eiichi Nakai,
Yoshihiro Sawano,
Minglei Shi
Abstract:
In the present paper, we will characterize the boundedness of the generalized fractional integral operators $I_ρ$ and the generalized fractional maximal operators $M_ρ$ on Orlicz spaces, respectively. Moreover, we will give a characterization for the Spanne-type boundedness and the Adams-type boundedness of the operators $M_ρ$ and $I_ρ$ on generalized Orlicz--Morrey spaces, respectively. Also we g…
▽ More
In the present paper, we will characterize the boundedness of the generalized fractional integral operators $I_ρ$ and the generalized fractional maximal operators $M_ρ$ on Orlicz spaces, respectively. Moreover, we will give a characterization for the Spanne-type boundedness and the Adams-type boundedness of the operators $M_ρ$ and $I_ρ$ on generalized Orlicz--Morrey spaces, respectively. Also we give criteria for the weak versions of the Spanne-type boundedness and the Adams-type boundedness of the operators $M_ρ$ and $I_ρ$ on generalized Orlicz--Morrey spaces.
△ Less
Submitted 10 December, 2018;
originally announced December 2018.
-
A new approach to the Kasami codes of type 2
Authors:
Minjia Shi,
Denis Krotov,
Patrick Solé
Abstract:
The dual of the Kasami code of length $q^2-1$, with $q$ a power of $2$, is constructed by concatenating a cyclic MDS code of length $q+1$ over $F_q$ with a Simplex code of length $q-1$. This yields a new derivation of the weight distribution of the Kasami code, a new description of its coset graph, and a new proof that the Kasami code is completely regular. The automorphism groups of the Kasami co…
▽ More
The dual of the Kasami code of length $q^2-1$, with $q$ a power of $2$, is constructed by concatenating a cyclic MDS code of length $q+1$ over $F_q$ with a Simplex code of length $q-1$. This yields a new derivation of the weight distribution of the Kasami code, a new description of its coset graph, and a new proof that the Kasami code is completely regular. The automorphism groups of the Kasami code and the related $q$-ary MDS code are determined. New cyclic completely regular codes over finite fields a power of $2$, generalized Kasami codes, are constructed; they have coset graphs isomorphic to that of the Kasami codes. Another wide class of completely regular codes, including additive codes, as well as unrestricted codes, is obtained by combining cosets of the Kasami or generalized Kasami code.
△ Less
Submitted 26 June, 2023; v1 submitted 28 September, 2018;
originally announced October 2018.
-
A new distance-regular graph of diameter 3 on 1024 vertices
Authors:
Minjia Shi,
Denis Krotov,
Patrick Solé
Abstract:
The dodecacode is a nonlinear additive quaternary code of length $12$. By puncturing it at any of the twelve coordinates, we obtain a uniformly packed code of distance $5$. In particular, this latter code is completely regular but not completely transitive. Its coset graph is distance-regular of diameter three on $2^{10}$ vertices, with new intersection array $\{33,30,15;1,2,15\}$. The automorphis…
▽ More
The dodecacode is a nonlinear additive quaternary code of length $12$. By puncturing it at any of the twelve coordinates, we obtain a uniformly packed code of distance $5$. In particular, this latter code is completely regular but not completely transitive. Its coset graph is distance-regular of diameter three on $2^{10}$ vertices, with new intersection array $\{33,30,15;1,2,15\}$. The automorphism groups of the code, and of the graph, are determined. Connecting the vertices at distance two gives a strongly regular graph of (previously known) parameters $(2^{10},495,238,240)$. Another strongly regular graph with the same parameters is constructed on the codewords of the dual code. A non trivial completely regular binary code of length $33$ is constructed.
△ Less
Submitted 5 November, 2018; v1 submitted 19 June, 2018;
originally announced June 2018.
-
The number of the non-full-rank Steiner triple systems
Authors:
Minjia Shi,
Li Xu,
Denis S. Krotov
Abstract:
The $p$-rank of a Steiner triple system $B$ is the dimension of the linear span of the set of characteristic vectors of blocks of $B$, over GF$(p)$. We derive a formula for the number of different Steiner triple systems of order $v$ and given $2$-rank $r_2$, $r_2<v$, and a formula for the number of Steiner triple systems of order $v$ and given $3$-rank $r_3$, $r_3<v-1$. Also, we prove that there a…
▽ More
The $p$-rank of a Steiner triple system $B$ is the dimension of the linear span of the set of characteristic vectors of blocks of $B$, over GF$(p)$. We derive a formula for the number of different Steiner triple systems of order $v$ and given $2$-rank $r_2$, $r_2<v$, and a formula for the number of Steiner triple systems of order $v$ and given $3$-rank $r_3$, $r_3<v-1$. Also, we prove that there are no Steiner triple systems of $2$-rank smaller than $v$ and, at the same time, $3$-rank smaller than $v-1$. Our results extend previous work on enumerating Steiner triple systems according to the rank of their codes, mainly by Tonchev, V.A.Zinoviev and D.V.Zinoviev for the binary case and by Jungnickel and Tonchev for the ternary case.
△ Less
Submitted 6 March, 2021; v1 submitted 31 May, 2018;
originally announced June 2018.
-
A Progressive Batching L-BFGS Method for Machine Learning
Authors:
Raghu Bollapragada,
Dheevatsa Mudigere,
Jorge Nocedal,
Hao-Jun Michael Shi,
** Tak Peter Tang
Abstract:
The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function. All of this appears to call for a full batch approach, but since small batch sizes give rise to faster algorithms with better generalization pr…
▽ More
The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function. All of this appears to call for a full batch approach, but since small batch sizes give rise to faster algorithms with better generalization properties, L-BFGS is currently not considered an algorithm of choice for large-scale machine learning applications. One need not, however, choose between the two extremes represented by the full batch or highly stochastic regimes, and may instead follow a progressive batching approach in which the sample size increases during the course of the optimization. In this paper, we present a new version of the L-BFGS algorithm that combines three basic components - progressive batching, a stochastic line search, and stable quasi-Newton updating - and that performs well on training logistic regression and deep neural networks. We provide supporting convergence theory for the method.
△ Less
Submitted 30 May, 2018; v1 submitted 14 February, 2018;
originally announced February 2018.
-
Diagonal implicit symplectic ERKN methods for solving oscillatory Hamiltonian systems
Authors:
Mingxue Shi,
Hao Zhang,
Bin Wang
Abstract:
This paper studies diagonal implicit symplectic extended Runge--Kutta--Nyström (ERKN) methods for solving the oscillatory Hamiltonian system $H(q,p)=\dfrac{1}{2}p^{T}p+\dfrac{1}{2}q^{T}Mq+U(q)$. Based on symplectic conditions and order conditions, we construct some diagonal implicit symplectic ERKN methods. The stability of the obtained methods is discussed. Three numerical experiments are carried…
▽ More
This paper studies diagonal implicit symplectic extended Runge--Kutta--Nyström (ERKN) methods for solving the oscillatory Hamiltonian system $H(q,p)=\dfrac{1}{2}p^{T}p+\dfrac{1}{2}q^{T}Mq+U(q)$. Based on symplectic conditions and order conditions, we construct some diagonal implicit symplectic ERKN methods. The stability of the obtained methods is discussed. Three numerical experiments are carried out and the numerical results demonstrate the remarkable numerical behavior of the new diagonal implicit symplectic methods when applied to the oscillatory Hamiltonian system.
△ Less
Submitted 1 December, 2017;
originally announced December 2017.
-
On the proximity of large primes
Authors:
Minjia Shi,
Florian Luca,
Patrick Solé
Abstract:
By a sphere-packing argument, we show that there are infinitely many pairs of primes that are close to each other for some metrics on the integers. In particular, for any numeration basis $q$, we show that there are infinitely many pairs of primes the base $q$ expansion of which differ in at most two digits. Likewise, for any fixed integer $t,$ there are infinitely many pairs of primes, the first…
▽ More
By a sphere-packing argument, we show that there are infinitely many pairs of primes that are close to each other for some metrics on the integers. In particular, for any numeration basis $q$, we show that there are infinitely many pairs of primes the base $q$ expansion of which differ in at most two digits. Likewise, for any fixed integer $t,$ there are infinitely many pairs of primes, the first $t$ digits of which are the same. In another direction, we show that, there is a constant $c$ depending on $q$ such that for infinitely many integers $m$ there are at least $c\log \log m$ primes which differ from $m$ by at most one base $q$ digit.
△ Less
Submitted 15 November, 2017;
originally announced November 2017.
-
On Codes over $\mathbb{F}_{q}+v\mathbb{F}_{q}+v^{2}\mathbb{F}_{q}$
Authors:
A. Melakhessou,
K. Guenda,
T. A. Gulliver,
M. Shi,
P. Solé
Abstract:
In this paper we investigate linear codes with complementary dual (LCD) codes and formally self-dual codes over the ring $R=\F_{q}+v\F_{q}+v^{2}\F_{q}$, where $v^{3}=v$, for $q$ odd. We give conditions on the existence of LCD codes and present construction of formally self-dual codes over $R$. Further, we give bounds on the minimum distance of LCD codes over $\F_q$ and extend these to codes over…
▽ More
In this paper we investigate linear codes with complementary dual (LCD) codes and formally self-dual codes over the ring $R=\F_{q}+v\F_{q}+v^{2}\F_{q}$, where $v^{3}=v$, for $q$ odd. We give conditions on the existence of LCD codes and present construction of formally self-dual codes over $R$. Further, we give bounds on the minimum distance of LCD codes over $\F_q$ and extend these to codes over $R$.
△ Less
Submitted 11 April, 2017;
originally announced April 2017.
-
A Primer on Coordinate Descent Algorithms
Authors:
Hao-Jun Michael Shi,
Shenyinying Tu,
Yangyang Xu,
Wotao Yin
Abstract:
This monograph presents a class of algorithms called coordinate descent algorithms for mathematicians, statisticians, and engineers outside the field of optimization. This particular class of algorithms has recently gained popularity due to their effectiveness in solving large-scale optimization problems in machine learning, compressed sensing, image processing, and computational statistics. Coord…
▽ More
This monograph presents a class of algorithms called coordinate descent algorithms for mathematicians, statisticians, and engineers outside the field of optimization. This particular class of algorithms has recently gained popularity due to their effectiveness in solving large-scale optimization problems in machine learning, compressed sensing, image processing, and computational statistics. Coordinate descent algorithms solve optimization problems by successively minimizing along each coordinate or coordinate hyperplane, which is ideal for parallelized and distributed computing. Avoiding detailed technicalities and proofs, this monograph gives relevant theory and examples for practitioners to effectively apply coordinate descent to modern problems in data science and engineering.
△ Less
Submitted 12 January, 2017; v1 submitted 30 September, 2016;
originally announced October 2016.
-
A note on convergence analysis of NURBS curve when weights approach infinity
Authors:
Mao Shi
Abstract:
This article considers the convergence of NURBS curve when weights approach infinity. We shows that limit of NURBS curve dose not exist when independent variables weights approach infinity. Further, pointwise convergence uniform convergence and $L^1$ convergence are researched.
This article considers the convergence of NURBS curve when weights approach infinity. We shows that limit of NURBS curve dose not exist when independent variables weights approach infinity. Further, pointwise convergence uniform convergence and $L^1$ convergence are researched.
△ Less
Submitted 15 April, 2016;
originally announced April 2016.
-
Practical Algorithms for Learning Near-Isometric Linear Embeddings
Authors:
Jerry Luo,
Kayla Shapiro,
Hao-Jun Michael Shi,
Qi Yang,
Kan Zhu
Abstract:
We propose two practical non-convex approaches for learning near-isometric, linear embeddings of finite sets of data points. Given a set of training points $\mathcal{X}$, we consider the secant set $S(\mathcal{X})$ that consists of all pairwise difference vectors of $\mathcal{X}$, normalized to lie on the unit sphere. The problem can be formulated as finding a symmetric and positive semi-definite…
▽ More
We propose two practical non-convex approaches for learning near-isometric, linear embeddings of finite sets of data points. Given a set of training points $\mathcal{X}$, we consider the secant set $S(\mathcal{X})$ that consists of all pairwise difference vectors of $\mathcal{X}$, normalized to lie on the unit sphere. The problem can be formulated as finding a symmetric and positive semi-definite matrix $\boldsymbolΨ$ that preserves the norms of all the vectors in $S(\mathcal{X})$ up to a distortion parameter $δ$. Motivated by non-negative matrix factorization, we reformulate our problem into a Frobenius norm minimization problem, which is solved by the Alternating Direction Method of Multipliers (ADMM) and develop an algorithm, FroMax. Another method solves for a projection matrix $\boldsymbolΨ$ by minimizing the restricted isometry property (RIP) directly over the set of symmetric, postive semi-definite matrices. Applying ADMM and a Moreau decomposition on a proximal map**, we develop another algorithm, NILE-Pro, for dimensionality reduction. FroMax is shown to converge faster for smaller $δ$ while NILE-Pro converges faster for larger $δ$. Both non-convex approaches are then empirically demonstrated to be more computationally efficient than prior convex approaches for a number of applications in machine learning and signal processing.
△ Less
Submitted 22 April, 2016; v1 submitted 1 January, 2016;
originally announced January 2016.
-
Methods for Quantized Compressed Sensing
Authors:
Hao-Jun Michael Shi,
Mindy Case,
Xiaoyi Gu,
Shenyinying Tu,
Deanna Needell
Abstract:
In this paper, we compare and catalog the performance of various greedy quantized compressed sensing algorithms that reconstruct sparse signals from quantized compressed measurements. We also introduce two new greedy approaches for reconstruction: Quantized Compressed Sampling Matching Pursuit (QCoSaMP) and Adaptive Outlier Pursuit for Quantized Iterative Hard Thresholding (AOP-QIHT). We compare t…
▽ More
In this paper, we compare and catalog the performance of various greedy quantized compressed sensing algorithms that reconstruct sparse signals from quantized compressed measurements. We also introduce two new greedy approaches for reconstruction: Quantized Compressed Sampling Matching Pursuit (QCoSaMP) and Adaptive Outlier Pursuit for Quantized Iterative Hard Thresholding (AOP-QIHT). We compare the performance of greedy quantized compressed sensing algorithms for a given bit-depth, sparsity, and noise level.
△ Less
Submitted 30 December, 2015;
originally announced December 2015.
-
Bootstrap Random Walks
Authors:
Andrea Collevecchio,
Kais Hamza,
Meng Shi
Abstract:
Consider a one dimensional simple random walk $X=(X_n)_{n\geq0}$. We form a new simple symmetric random walk $Y=(Y_n)_{n\geq0}$ by taking sums of products of the increments of $X$ and study the two-dimensional walk $(X,Y)=((X_n,Y_n))_{n\geq0}$. We show that it is recurrent and when suitably normalised converges to a two-dimensional Brownian motion with independent components; this independence occ…
▽ More
Consider a one dimensional simple random walk $X=(X_n)_{n\geq0}$. We form a new simple symmetric random walk $Y=(Y_n)_{n\geq0}$ by taking sums of products of the increments of $X$ and study the two-dimensional walk $(X,Y)=((X_n,Y_n))_{n\geq0}$. We show that it is recurrent and when suitably normalised converges to a two-dimensional Brownian motion with independent components; this independence occurs despite the functional dependence between the pre-limit processes. The process of recycling increments in this way is repeated and a multi-dimensional analog of this limit theorem together with a transience result are obtained. The construction and results are extended to include the case where the increments take values in a finite set (not necessarily $\{-1,+1\}$).
△ Less
Submitted 17 August, 2015; v1 submitted 12 August, 2015;
originally announced August 2015.