-
Gradient descent in matrix factorization: Understanding large initialization
Authors:
Hengchao Chen,
Xin Chen,
Mohamad Elmasri,
Qiang Sun
Abstract:
Gradient Descent (GD) has been proven effective in solving various matrix factorization problems. However, its optimization behavior with large initial values remains less understood. To address this gap, this paper presents a novel theoretical framework for examining the convergence trajectory of GD with a large initialization. The framework is grounded in signal-to-noise ratio concepts and induc…
▽ More
Gradient Descent (GD) has been proven effective in solving various matrix factorization problems. However, its optimization behavior with large initial values remains less understood. To address this gap, this paper presents a novel theoretical framework for examining the convergence trajectory of GD with a large initialization. The framework is grounded in signal-to-noise ratio concepts and inductive arguments. The results uncover an implicit incremental learning phenomenon in GD and offer a deeper understanding of its performance in large initialization scenarios.
△ Less
Submitted 31 May, 2024; v1 submitted 30 May, 2023;
originally announced May 2023.
-
Parallel sampling of decomposable graphs using Markov chain on junction trees
Authors:
Mohamad Elmasri
Abstract:
Bayesian inference for undirected graphical models is mostly restricted to the class of decomposable graphs, as they enjoy a rich set of properties making them amenable to high-dimensional problems. While parameter inference is straightforward in this setup, inferring the underlying graph is a challenge driven by the computational difficulty in exploring the space of decomposable graphs. This work…
▽ More
Bayesian inference for undirected graphical models is mostly restricted to the class of decomposable graphs, as they enjoy a rich set of properties making them amenable to high-dimensional problems. While parameter inference is straightforward in this setup, inferring the underlying graph is a challenge driven by the computational difficulty in exploring the space of decomposable graphs. This work makes two contributions to address this problem. First, we provide sufficient and necessary conditions for when multi-edge perturbations maintain decomposability of the graph. Using these, we characterize a simple class of partitions that efficiently classify all edge perturbations by whether they maintain decomposability. Second, we propose a novel parallel non-reversible Markov chain Monte Carlo sampler for distributions over junction tree representations of the graph. At every step, the parallel sampler executes simultaneously all edge perturbations within a partition. Through simulations, we demonstrate the efficiency of our new edge perturbation conditions and class of partitions. We find that our parallel sampler yields improved mixing properties in comparison to the single-move variate, and outperforms current state-of-the-arts methods in terms of accuracy and computational efficiency. The implementation of our work is available in the Python package parallelDG.
△ Less
Submitted 31 December, 2023; v1 submitted 5 September, 2022;
originally announced September 2022.
-
Optimal projection to improve parametric importance sampling in high dimension
Authors:
Maxime ElMasri,
Jérôme Morio,
Florian Simatos
Abstract:
In this paper we propose a dimension-reduction strategy in order to improve the performance of importance sampling in high dimension. The idea is to estimate variance terms in a small number of suitably chosen directions. We first prove that the optimal directions, i.e., the ones that minimize the Kullback--Leibler divergence with the optimal auxiliary density, are the eigenvectors associated to e…
▽ More
In this paper we propose a dimension-reduction strategy in order to improve the performance of importance sampling in high dimension. The idea is to estimate variance terms in a small number of suitably chosen directions. We first prove that the optimal directions, i.e., the ones that minimize the Kullback--Leibler divergence with the optimal auxiliary density, are the eigenvectors associated to extreme (small or large) eigenvalues of the optimal covariance matrix. We then perform extensive numerical experiments that show that as dimension increases, these directions give estimations which are very close to optimal. Moreover, we show that the estimation remains accurate even when a simple empirical estimator of the covariance matrix is used to estimate these directions. These theoretical and numerical results open the way for different generalizations, in particular the incorporation of such ideas in adaptive importance sampling schemes.
△ Less
Submitted 23 March, 2022; v1 submitted 13 July, 2021;
originally announced July 2021.
-
Predictive inference for travel time on transportation networks
Authors:
Mohamad Elmasri,
Aurelie Labbe,
Denis Larocque,
Laurent Charlin
Abstract:
Recent statistical methods fitted on large-scale GPS data can provide accurate estimations of the expected travel time between two points. However, little is known about the distribution of travel time, which is key to decision-making across a number of logistic problems. With sufficient data, single road-segment travel time can be well approximated. The challenge lies in understanding how to aggr…
▽ More
Recent statistical methods fitted on large-scale GPS data can provide accurate estimations of the expected travel time between two points. However, little is known about the distribution of travel time, which is key to decision-making across a number of logistic problems. With sufficient data, single road-segment travel time can be well approximated. The challenge lies in understanding how to aggregate such information over a route to arrive at the route-distribution of travel time. We develop a novel statistical approach to this problem. We show that, under general conditions, without assuming a distribution of speed, travel time {divided by route distance follows a Gaussian distribution with route-invariant population mean and variance. We develop efficient inference methods for such parameters and propose asymptotically tight population prediction intervals for travel time. Using traffic flow information, we further develop a trip-specific Gaussian-based predictive distribution, resulting in tight prediction intervals for short and long trips. Our methods, implemented in an R-package, are illustrated in a real-world case study using mobile GPS data, showing that our trip-specific and population intervals both achieve the 95\% theoretical coverage levels. Compared to alternative approaches, our trip-specific predictive distribution achieves (a) the theoretical coverage at every level of significance, (b) tighter prediction intervals, (c) less predictive bias, and (d) more efficient estimation and prediction procedures. This makes our approach promising for low-latency, large-scale transportation applications.
△ Less
Submitted 19 March, 2023; v1 submitted 23 April, 2020;
originally announced April 2020.
-
Sub-clustering in decomposable graphs and size-varying junction trees
Authors:
Mohamad Elmasri
Abstract:
This paper proposes a novel representation of decomposable graphs based on semi-latent tree-dependent bipartite graphs. The novel representation has two main benefits. First, it enables a form of sub-clustering within maximal cliques of the graph, adding informational richness to the general use of decomposable graphs that could be harnessed in applications with behavioural type of data. Second, i…
▽ More
This paper proposes a novel representation of decomposable graphs based on semi-latent tree-dependent bipartite graphs. The novel representation has two main benefits. First, it enables a form of sub-clustering within maximal cliques of the graph, adding informational richness to the general use of decomposable graphs that could be harnessed in applications with behavioural type of data. Second, it allows for a new node-driven Markov chain Monte Carlo sampler of decomposable graphs that can easily parallelize and scale. The proposed sampler also benefits from the computational efficiency of junction-tree-based samplers of decomposable graphs.
△ Less
Submitted 4 December, 2017;
originally announced December 2017.
-
On decomposable random graphs
Authors:
Mohamad Elmasri
Abstract:
Decomposable graphs are known for their tedious and complicated Markov update steps. Instead of modelling them directly, this work introduces a class of tree-dependent bipartite graphs that span the projective space of decomposable graphs. This is achieved through dimensionality expansion that causes the graph nodes to be conditionally independent given a latent tree. The Markov update steps are t…
▽ More
Decomposable graphs are known for their tedious and complicated Markov update steps. Instead of modelling them directly, this work introduces a class of tree-dependent bipartite graphs that span the projective space of decomposable graphs. This is achieved through dimensionality expansion that causes the graph nodes to be conditionally independent given a latent tree. The Markov update steps are thus remarkably simplified. Structural modelling with tree-dependent bipartite graphs has additional benefits. For example, certain properties that are hardly attainable in the decomposable form are now easily accessible. Moreover, tree-dependent bipartite graphs can extract and model extra information related to sub-clustering dynamics, while currently known models for decomposable graphs do not. Properties of decomposable graphs are also transferable to the expanded dimension, such as the attractive likelihood factorization property. As a result of using the bipartite representation, tools developed for random graphs can be used. Hence, a framework for random tree-dependent bipartite graphs, thereupon for random decomposable graphs, is proposed.
△ Less
Submitted 9 October, 2017;
originally announced October 2017.
-
A Skew-Normal Copula-Driven GLMM
Authors:
Kalyan Das,
Mohamad Elmasri,
Arusharka Sen
Abstract:
This paper presents a method for fitting a copula-driven generalized linear mixed models. For added flexibility, the skew-normal copula is adopted for fitting. The correlation matrix of the skew-normal copula is used to capture the dependence structure within units, while the fixed and random effects coefficients are estimated through the mean of the copula. For estimation, a Monte Carlo expectati…
▽ More
This paper presents a method for fitting a copula-driven generalized linear mixed models. For added flexibility, the skew-normal copula is adopted for fitting. The correlation matrix of the skew-normal copula is used to capture the dependence structure within units, while the fixed and random effects coefficients are estimated through the mean of the copula. For estimation, a Monte Carlo expectation-maximization algorithm is developed. Simulations are shown alongside a real data example from the Framingham Heart Study.
△ Less
Submitted 29 July, 2017;
originally announced July 2017.
-
A hierarchical Bayesian model for predicting ecological interactions using scaled evolutionary relationships
Authors:
Mohamad Elmasri,
Maxwell J. Farrell,
T. Jonathan Davies,
David A. Stephens
Abstract:
Identifying undocumented or potential future interactions among species is a challenge facing modern ecologists. Recent link prediction methods rely on trait data, however large species interaction databases are typically sparse and covariates are limited to only a fraction of species. On the other hand, evolutionary relationships, encoded as phylogenetic trees, can act as proxies for underlying t…
▽ More
Identifying undocumented or potential future interactions among species is a challenge facing modern ecologists. Recent link prediction methods rely on trait data, however large species interaction databases are typically sparse and covariates are limited to only a fraction of species. On the other hand, evolutionary relationships, encoded as phylogenetic trees, can act as proxies for underlying traits and historical patterns of parasite sharing among hosts. We show that using a network-based conditional model, phylogenetic information provides strong predictive power in a recently published global database of host-parasite interactions. By scaling the phylogeny using an evolutionary model, our method allows for biological interpretation often missing from latent variable models. To further improve on the phylogeny-only model, we combine a hierarchical Bayesian latent score framework for bipartite graphs that accounts for the number of interactions per species with the host dependence informed by phylogeny. Combining the two information sources yields significant improvement in predictive accuracy over each of the submodels alone. As many interaction networks are constructed from presence-only data, we extend the model by integrating a correction mechanism for missing interactions, which proves valuable in reducing uncertainty in unobserved interactions.
△ Less
Submitted 19 September, 2019; v1 submitted 26 July, 2017;
originally announced July 2017.