-
Dual Induction CLT for High-dimensional m-dependent Data
Authors:
Heejong Bong,
Arun Kumar Kuchibhotla,
Alessandro Rinaldo
Abstract:
We derive novel and sharp high-dimensional Berry--Esseen bounds for the sum of $m$-dependent random vectors over the class of hyper-rectangles exhibiting only a poly-logarithmic dependence in the dimension. Our results hold under minimal assumptions, such as non-degenerate covariances and finite third moments, and yield a sample complexity of order $\sqrt{m/n}$, aside from logarithmic terms, match…
▽ More
We derive novel and sharp high-dimensional Berry--Esseen bounds for the sum of $m$-dependent random vectors over the class of hyper-rectangles exhibiting only a poly-logarithmic dependence in the dimension. Our results hold under minimal assumptions, such as non-degenerate covariances and finite third moments, and yield a sample complexity of order $\sqrt{m/n}$, aside from logarithmic terms, matching the optimal rates established in the univariate case. When specialized to the sums of independent non-degenerate random vectors, we obtain sharp rates under the weakest possible conditions. On the technical side, we develop an inductive relationship between anti-concentration inequalities and Berry--Esseen bounds, inspired by the classical Lindeberg swap** method and the concentration inequality approach for dependent data, that may be of independent interest.
△ Less
Submitted 16 November, 2023; v1 submitted 25 June, 2023;
originally announced June 2023.
-
Tight Concentration Inequality for Sub-Weibull Random Variables with Generalized Bernstien Orlicz norm
Authors:
Heejong Bong,
Arun Kumar Kuchibhotla
Abstract:
Recent development in high-dimensional statistical inference has necessitated concentration inequalities for a broader range of random variables. We focus on sub-Weibull random variables, which extend sub-Gaussian or sub-exponential random variables to allow heavy-tailed distributions. This paper presents concentration inequalities for independent sub-Weibull random variables with finite Generaliz…
▽ More
Recent development in high-dimensional statistical inference has necessitated concentration inequalities for a broader range of random variables. We focus on sub-Weibull random variables, which extend sub-Gaussian or sub-exponential random variables to allow heavy-tailed distributions. This paper presents concentration inequalities for independent sub-Weibull random variables with finite Generalized Bernstein-Orlicz norms, providing generalized Bernstein's inequalities and Rosenthal-type moment bounds. The tightness of the proposed bounds is shown through lower bounds of the concentration inequalities obtained via the Paley-Zygmund inequality. The results are applied to a graphical model inference problem, improving previous sample complexity bounds.
△ Less
Submitted 25 February, 2023; v1 submitted 7 February, 2023;
originally announced February 2023.
-
High-dimensional Berry-Esseen Bound for $m$-Dependent Random Samples
Authors:
Heejong Bong,
Arun Kumar Kuchibhotla,
Alessandro Rinaldo
Abstract:
In this work, we provide a $(n/m)^{-1/2}$-rate finite sample Berry-Esseen bound for $m$-dependent high-dimensional random vectors over the class of hyper-rectangles. This bound imposes minimal assumptions on the random vectors such as nondegenerate covariances and finite third moments. The proof uses inductive relationships between anti-concentration inequalities and Berry--Esseen bounds, which ar…
▽ More
In this work, we provide a $(n/m)^{-1/2}$-rate finite sample Berry-Esseen bound for $m$-dependent high-dimensional random vectors over the class of hyper-rectangles. This bound imposes minimal assumptions on the random vectors such as nondegenerate covariances and finite third moments. The proof uses inductive relationships between anti-concentration inequalities and Berry--Esseen bounds, which are inspired by the telesco** method of Chen and Shao (2004) and the recursion method of Kuchibhotla and Rinaldo (2020). Performing a dual induction based on the relationships, we obtain tight Berry-Esseen bounds for dependent samples.
△ Less
Submitted 10 December, 2022;
originally announced December 2022.
-
Isomorphisms and properties of TAR reconfiguration graphs for zero forcing and other $X$-set parameters
Authors:
Novi H. Bong,
Joshua Carlson,
Bryan Curtis,
Ruth Haas,
Leslie Hogben
Abstract:
An $X$-TAR (token addition/removal) reconfiguration graph has as its vertices sets that satisfy some property $X$, with an edge between two sets if one is obtained from the other by adding or removing one element. This paper considers the $X$-TAR graph for $X-$ sets of vertices of a base graph $G$ where the $X$-sets of $G$ must satisfy certain conditions. Dominating sets, power dominating sets, ze…
▽ More
An $X$-TAR (token addition/removal) reconfiguration graph has as its vertices sets that satisfy some property $X$, with an edge between two sets if one is obtained from the other by adding or removing one element. This paper considers the $X$-TAR graph for $X-$ sets of vertices of a base graph $G$ where the $X$-sets of $G$ must satisfy certain conditions. Dominating sets, power dominating sets, zero forcing sets, and positive semidefinite zero forcing sets are all examples of $X$-sets. For graphs $G$ and $G'$ with no isolated vertices, it is shown that $G$ and $G'$ have isomorphic $X$-TAR reconfiguration graphs if and only if there is a relabeling of the vertices of $G'$ such that $G$ and $G'$ have exactly the same $X$-sets. The concept of an $X$-irrelevant vertex is introduced to facilitate analysis of $X$-TAR graph isomorphisms. Furthermore, results related to the connectedness of the zero forcing TAR graph are given. We present families of graphs that exceed known lower bounds for connectedness parameters.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Generalized Results for the Existence and Consistency of the MLE in the Bradley-Terry-Luce Model
Authors:
Heejong Bong,
Alessandro Rinaldo
Abstract:
Ranking problems based on pairwise comparisons, such as those arising in online gaming, often involve a large pool of items to order. In these situations, the gap in performance between any two items can be significant, and the smallest and largest winning probabilities can be very close to zero or one. Furthermore, each item may be compared only to a subset of all the items, so that not all pairw…
▽ More
Ranking problems based on pairwise comparisons, such as those arising in online gaming, often involve a large pool of items to order. In these situations, the gap in performance between any two items can be significant, and the smallest and largest winning probabilities can be very close to zero or one. Furthermore, each item may be compared only to a subset of all the items, so that not all pairwise comparisons are observed. In this paper, we study the performance of the Bradley-Terry-Luce model for ranking from pairwise comparison data under more realistic settings than those considered in the literature so far. In particular, we allow for near-degenerate winning probabilities and arbitrary comparison designs. We obtain novel results about the existence of the maximum likelihood estimator (MLE) and the corresponding $\ell_2$ estimation error without the bounded winning probability assumption commonly used in the literature and for arbitrary comparison graph topologies. Central to our approach is the reliance on the Fisher information matrix to express the dependence on the graph topologies and the impact of the values of the winning probabilities on the estimation risk and on the conditions for the existence of the MLE. Our bounds recover existing results as special cases but are more broadly applicable.
△ Less
Submitted 15 June, 2022; v1 submitted 21 October, 2021;
originally announced October 2021.
-
The Threshold Strong Dimension of a Graph
Authors:
Nadia Benakli,
Novi H Bong,
Shonda M. Dueck,
Linda Eroh,
Beth Novick,
Ortrud R. Oellermann
Abstract:
Let $G$ be a connected graph and $u,v$ and $w$ vertices of $G$. Then $w$ is said to {\em strongly resolve} $u$ and $v$, if there is either a shortest $u$-$w$ path that contains $v$ or a shortest $v$-$w$ path that contains $u$. A set $W$ of vertices of $G$ is a {\em strong resolving set} if every pair of vertices of $G$ is strongly resolved by some vertex of $W$. A smallest strong resolving set of…
▽ More
Let $G$ be a connected graph and $u,v$ and $w$ vertices of $G$. Then $w$ is said to {\em strongly resolve} $u$ and $v$, if there is either a shortest $u$-$w$ path that contains $v$ or a shortest $v$-$w$ path that contains $u$. A set $W$ of vertices of $G$ is a {\em strong resolving set} if every pair of vertices of $G$ is strongly resolved by some vertex of $W$. A smallest strong resolving set of a graph is called a {\em strong basis} and its cardinality, denoted $β_s(G)$, the {\em strong dimension} of $G$. The {\em threshold strong dimension} of a graph $G$, denoted $τ_s(G)$, is the smallest strong dimension among all graphs having $G$ as spanning subgraph. A graph whose strong dimension equals its threshold strong dimension is called $β_s$-{\em irreducible}. In this paper we establish a geometric characterization for the threshold strong dimension of a graph $G$ that is expressed in terms of the smallest number of paths (each of sufficiently large order) whose strong product admits a certain type of embedding of $G$. We demonstrate that the threshold strong dimension of a graph is not equal to the previously studied threshold dimension of a graph. Graphs with strong dimension $1$ and $2$ are necessarily $β_s$-irreducible. It is well-known that the only graphs with strong dimension $1$ are the paths. We completely describe graphs with strong dimension $2$ in terms of the strong resolving graphs introduced by Oellermann and Peters-Fransen. We obtain sharp upper bounds for the threshold strong dimension of general graphs and determine exact values for this invariant for certain subclasses of trees.
△ Less
Submitted 10 August, 2020;
originally announced August 2020.
-
Nonparametric Estimation in the Dynamic Bradley-Terry Model
Authors:
Heejong Bong,
Wanshan Li,
Shamindra Shrotriya,
Alessandro Rinaldo
Abstract:
We propose a time-varying generalization of the Bradley-Terry model that allows for nonparametric modeling of dynamic global rankings of distinct teams. We develop a novel estimator that relies on kernel smoothing to pre-process the pairwise comparisons over time and is applicable in sparse settings where the Bradley-Terry may not be fit. We obtain necessary and sufficient conditions for the exist…
▽ More
We propose a time-varying generalization of the Bradley-Terry model that allows for nonparametric modeling of dynamic global rankings of distinct teams. We develop a novel estimator that relies on kernel smoothing to pre-process the pairwise comparisons over time and is applicable in sparse settings where the Bradley-Terry may not be fit. We obtain necessary and sufficient conditions for the existence and uniqueness of our estimator. We also derive time-varying oracle bounds for both the estimation error and the excess risk in the model-agnostic setting where the Bradley-Terry model is not necessarily the true data generating process. We thoroughly test the practical effectiveness of our model using both simulated and real world data and suggest an efficient data-driven approach for bandwidth tuning.
△ Less
Submitted 28 February, 2020;
originally announced March 2020.