-
Adaptive debiased SGD in high-dimensional GLMs with streaming data
Authors:
Ruijian Han,
Lan Luo,
Yuanhang Luo,
Yuanyuan Lin,
Jian Huang
Abstract:
Online statistical inference facilitates real-time analysis of sequentially collected data, making it different from traditional methods that rely on static datasets. This paper introduces a novel approach to online inference in high-dimensional generalized linear models, where we update regression coefficient estimates and their standard errors upon each new data arrival. In contrast to existing…
▽ More
Online statistical inference facilitates real-time analysis of sequentially collected data, making it different from traditional methods that rely on static datasets. This paper introduces a novel approach to online inference in high-dimensional generalized linear models, where we update regression coefficient estimates and their standard errors upon each new data arrival. In contrast to existing methods that either require full dataset access or large-dimensional summary statistics storage, our method operates in a single-pass mode, significantly reducing both time and space complexity. The core of our methodological innovation lies in an adaptive stochastic gradient descent algorithm tailored for dynamic objective functions, coupled with a novel online debiasing procedure. This allows us to maintain low-dimensional summary statistics while effectively controlling optimization errors introduced by the dynamically changing loss functions. We demonstrate that our method, termed the Approximated Debiased Lasso (ADL), not only mitigates the need for the bounded individual probability condition but also significantly improves numerical performance. Numerical experiments demonstrate that the proposed ADL method consistently exhibits robust performance across various covariance matrix structures.
△ Less
Submitted 1 June, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Multivariate Dynamic Mediation Analysis under a Reinforcement Learning Framework
Authors:
Lan Luo,
Chengchun Shi,
Jitao Wang,
Zhenke Wu,
Lexin Li
Abstract:
Mediation analysis is an important analytic tool commonly used in a broad range of scientific applications. In this article, we study the problem of mediation analysis when there are multivariate and conditionally dependent mediators, and when the variables are observed over multiple time points. The problem is challenging, because the effect of a mediator involves not only the path from the treat…
▽ More
Mediation analysis is an important analytic tool commonly used in a broad range of scientific applications. In this article, we study the problem of mediation analysis when there are multivariate and conditionally dependent mediators, and when the variables are observed over multiple time points. The problem is challenging, because the effect of a mediator involves not only the path from the treatment to this mediator itself at the current time point, but also all possible paths pointed to this mediator from its upstream mediators, as well as the carryover effects from all previous time points. We propose a novel multivariate dynamic mediation analysis approach. Drawing inspiration from the Markov decision process model that is frequently employed in reinforcement learning, we introduce a Markov mediation process paired with a system of time-varying linear structural equation models to formulate the problem. We then formally define the individual mediation effect, built upon the idea of simultaneous interventions and intervention calculus. We next derive the closed-form expression and propose an iterative estimation procedure under the Markov mediation process model. We study both the asymptotic property and the empirical performance of the proposed estimator, and further illustrate our method with a mobile health application.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Accelerating Inexact HyperGradient Descent for Bilevel Optimization
Authors:
Haikuo Yang,
Luo Luo,
Chris Junchi Li,
Michael I. Jordan
Abstract:
We present a method for solving general nonconvex-strongly-convex bilevel optimization problems. Our method -- the \emph{Restarted Accelerated HyperGradient Descent} (\texttt{RAHGD}) method -- finds an $ε$-first-order stationary point of the objective with $\tilde{\mathcal{O}}(κ^{3.25}ε^{-1.75})$ oracle complexity, where $κ$ is the condition number of the lower-level objective and $ε$ is the desir…
▽ More
We present a method for solving general nonconvex-strongly-convex bilevel optimization problems. Our method -- the \emph{Restarted Accelerated HyperGradient Descent} (\texttt{RAHGD}) method -- finds an $ε$-first-order stationary point of the objective with $\tilde{\mathcal{O}}(κ^{3.25}ε^{-1.75})$ oracle complexity, where $κ$ is the condition number of the lower-level objective and $ε$ is the desired accuracy. We also propose a perturbed variant of \texttt{RAHGD} for finding an $\big(ε,\mathcal{O}(κ^{2.5}\sqrtε\,)\big)$-second-order stationary point within the same order of oracle complexity. Our results achieve the best-known theoretical guarantees for finding stationary points in bilevel optimization and also improve upon the existing upper complexity bound for finding second-order stationary points in nonconvex-strongly-concave minimax optimization problems, setting a new state-of-the-art benchmark. Empirical studies are conducted to validate the theoretical results in this paper.
△ Less
Submitted 30 June, 2023;
originally announced July 2023.
-
Evaluating the Impact of Automated Vehicles on Residential Location Distribution using Activity-based Accessibility: A Case Study of Japanese Regional Areas
Authors:
Lichen Luo,
Giancarlos Parady,
Kiyoshi Takami
Abstract:
Automated Vehicles (AVs) are expected to disrupt the transport sector in the future. Extensive research efforts have been dedicated to studying its potential implications. However, the existing literature is yet limited regarding the long-term impacts. To fill this gap, this paper estimates and validates a residential location choice model to evaluate the impacts of AVs on residential location dis…
▽ More
Automated Vehicles (AVs) are expected to disrupt the transport sector in the future. Extensive research efforts have been dedicated to studying its potential implications. However, the existing literature is yet limited regarding the long-term impacts. To fill this gap, this paper estimates and validates a residential location choice model to evaluate the impacts of AVs on residential location distributions in a context of Japanese regional area. Activity-based accessibility is used to reflect the changes from AVs in transport costs. The year 2040 is set as the backdrop for the analyses, where the effects of the decreased population are reflected in the scenario settings, along with some other variables to accommodate the uncertainties in the characteristics of AVs. The simulation results confirm the potential of urban expansion. The results demonstrate that, compared to Base Scenario, the median distances between the residences and the closest Dwelling Attraction Areas expand by 7.2% and 41.6% for two AV scenarios, respectively. Two hypothetical policy mandates are then applied to alleviate the problem. The results suggest that providing a 20% subsidy to the land price is effective for the scenario with relatively conservative AV settings, as the median distance indicator can be resumed to the level of Base Scenario.
△ Less
Submitted 16 December, 2022;
originally announced December 2022.
-
Statistical Inference for Streamed Longitudinal Data
Authors:
Lan Luo,
**gshen Wang,
Emily C. Hector
Abstract:
Modern longitudinal data, for example from wearable devices, measures biological signals on a fixed set of participants at a diverging number of time points. Traditional statistical methods are not equipped to handle the computational burden of repeatedly analyzing the cumulatively growing dataset each time new data is collected. We propose a new estimation and inference framework for dynamic upda…
▽ More
Modern longitudinal data, for example from wearable devices, measures biological signals on a fixed set of participants at a diverging number of time points. Traditional statistical methods are not equipped to handle the computational burden of repeatedly analyzing the cumulatively growing dataset each time new data is collected. We propose a new estimation and inference framework for dynamic updating of point estimates and their standard errors across serially collected dependent datasets. The key technique is a decomposition of the extended score function of the quadratic inference function constructed over the cumulative longitudinal data into a sum of summary statistics over data batches. We show how this sum can be recursively updated without the need to access the whole dataset, resulting in a computationally efficient streaming procedure with minimal loss of statistical efficiency. We prove consistency and asymptotic normality of our streaming estimator as the number of data batches diverges, even as the number of independent participants remains fixed. Simulations highlight the advantages of our approach over traditional statistical methods that assume independence between data batches. Finally, we investigate the relationship between physical activity and several diseases through the analysis of accelerometry data from the National Health and Nutrition Examination Survey.
△ Less
Submitted 4 August, 2022;
originally announced August 2022.
-
COMET Flows: Towards Generative Modeling of Multivariate Extremes and Tail Dependence
Authors:
Andrew McDonald,
Pang-Ning Tan,
Lifeng Luo
Abstract:
Normalizing flows, a popular class of deep generative models, often fail to represent extreme phenomena observed in real-world processes. In particular, existing normalizing flow architectures struggle to model multivariate extremes, characterized by heavy-tailed marginal distributions and asymmetric tail dependence among variables. In light of this shortcoming, we propose COMET (COpula Multivaria…
▽ More
Normalizing flows, a popular class of deep generative models, often fail to represent extreme phenomena observed in real-world processes. In particular, existing normalizing flow architectures struggle to model multivariate extremes, characterized by heavy-tailed marginal distributions and asymmetric tail dependence among variables. In light of this shortcoming, we propose COMET (COpula Multivariate ExTreme) Flows, which decompose the process of modeling a joint distribution into two parts: (i) modeling its marginal distributions, and (ii) modeling its copula distribution. COMET Flows capture heavy-tailed marginal distributions by combining a parametric tail belief at extreme quantiles of the marginals with an empirical kernel density function at mid-quantiles. In addition, COMET Flows capture asymmetric tail dependence among multivariate extremes by viewing such dependence as inducing a low-dimensional manifold structure in feature space. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of COMET Flows in capturing both heavy-tailed marginals and asymmetric tail dependence compared to other state-of-the-art baseline architectures. All code is available on GitHub at https://github.com/andrewmcdonald27/COMETFlows.
△ Less
Submitted 2 May, 2022;
originally announced May 2022.
-
Online Causal Inference with Application to Near Real-Time Post-Market Vaccine Safety Surveillance
Authors:
Xu Shi,
Lan Luo
Abstract:
Streaming data routinely generated by mobile phones, social networks, e-commerce, and electronic health records present new opportunities for near real-time surveillance of the impact of an intervention on an outcome of interest via causal inference methods. However, as data grow rapidly in volume and velocity, storing and combing data become increasingly challenging. The amount of time and effort…
▽ More
Streaming data routinely generated by mobile phones, social networks, e-commerce, and electronic health records present new opportunities for near real-time surveillance of the impact of an intervention on an outcome of interest via causal inference methods. However, as data grow rapidly in volume and velocity, storing and combing data become increasingly challenging. The amount of time and effort spent to update analyses can grow exponentially, which defeats the purpose of instantaneous surveillance. Data sharing barriers in multi-center studies bring additional challenges to rapid signal detection and update. It is thus time to turn static causal inference to online causal learning that can incorporate new information as it becomes available without revisiting prior observations. In this paper, we present a framework for online estimation and inference of treatment effects leveraging a series of datasets that arrive sequentially without storing or re-accessing individual-level raw data. We establish estimation consistency and asymptotic normality of the proposed framework for online causal inference. In particular, our framework is robust to biased data batches in the sense that the proposed online estimator is asymptotically unbiased as long as the pooled data is a random sample of the target population regardless of whether each data batch is. We also provide an R package for analyzing streaming observational data that enjoys great computation efficiency compared to existing software packages for offline analyses. Our proposed methods are illustrated with extensive simulations and an application to sequential monitoring of adverse events post COVID-19 vaccine.
△ Less
Submitted 26 November, 2021;
originally announced November 2021.
-
Parallel-and-stream accelerator for computationally fast supervised learning
Authors:
Emily C. Hector,
Lan Luo,
Peter X. -K. Song
Abstract:
Two dominant distributed computing strategies have emerged to overcome the computational bottleneck of supervised learning with big data: parallel data processing in the MapReduce paradigm and serial data processing in the online streaming paradigm. Despite the two strategies' common divide-and-combine approach, they differ in how they aggregate information, leading to different trade-offs between…
▽ More
Two dominant distributed computing strategies have emerged to overcome the computational bottleneck of supervised learning with big data: parallel data processing in the MapReduce paradigm and serial data processing in the online streaming paradigm. Despite the two strategies' common divide-and-combine approach, they differ in how they aggregate information, leading to different trade-offs between statistical and computational performance. In this paper, we propose a new hybrid paradigm, termed a Parallel-and-Stream Accelerator (PASA), that uses the strengths of both strategies for computationally fast and statistically efficient supervised learning. PASA's architecture nests online streaming processing into each distributed and parallelized data process in a MapReduce framework. PASA leverages the advantages and mitigates the disadvantages of both the MapReduce and online streaming approaches to deliver a more flexible paradigm satisfying practical computing needs. We study the analytic properties and computational complexity of PASA, and detail its implementation for two key statistical learning tasks. We illustrate its performance through simulations and a large-scale data example building a prediction model for online purchases from advertising data.
△ Less
Submitted 29 October, 2021;
originally announced November 2021.
-
Real-Time Regression Analysis of Streaming Clustered Data With Possible Abnormal Data Batches
Authors:
Lan Luo,
Ling Zhou,
Peter X. -K. Song
Abstract:
This paper develops an incremental learning algorithm based on quadratic inference function (QIF) to analyze streaming datasets with correlated outcomes such as longitudinal data and clustered data. We propose a renewable QIF (RenewQIF) method within a paradigm of renewable estimation and incremental inference, in which parameter estimates are recursively renewed with current data and summary stat…
▽ More
This paper develops an incremental learning algorithm based on quadratic inference function (QIF) to analyze streaming datasets with correlated outcomes such as longitudinal data and clustered data. We propose a renewable QIF (RenewQIF) method within a paradigm of renewable estimation and incremental inference, in which parameter estimates are recursively renewed with current data and summary statistics of historical data, but with no use of any historical subject-level raw data. We compare our renewable estimation method with both offline QIF and offline generalized estimating equations (GEE) approach that process the entire cumulative subject-level data, and show theoretically and numerically that our renewable procedure enjoys statistical and computational efficiency. We also propose an approach to diagnose the homogeneity assumption of regression coefficients via a sequential goodness-of-fit test as a screening procedure on occurrences of abnormal data batches. We implement the proposed methodology by expanding existing Spark's Lambda architecture for the operation of statistical inference and data quality diagnosis. We illustrate the proposed methodology by extensive simulation studies and an analysis of streaming car crash datasets from the National Automotive Sampling System-Crashworthiness Data System (NASS CDS).
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
Revisiting Co-Occurring Directions: Sharper Analysis and Efficient Algorithm for Sparse Matrices
Authors:
Luo Luo,
Cheng Chen,
Guangzeng Xie,
Haishan Ye
Abstract:
We study the streaming model for approximate matrix multiplication (AMM). We are interested in the scenario that the algorithm can only take one pass over the data with limited memory. The state-of-the-art deterministic sketching algorithm for streaming AMM is the co-occurring directions (COD), which has much smaller approximation errors than randomized algorithms and outperforms other determinist…
▽ More
We study the streaming model for approximate matrix multiplication (AMM). We are interested in the scenario that the algorithm can only take one pass over the data with limited memory. The state-of-the-art deterministic sketching algorithm for streaming AMM is the co-occurring directions (COD), which has much smaller approximation errors than randomized algorithms and outperforms other deterministic sketching methods empirically. In this paper, we provide a tighter error bound for COD whose leading term considers the potential approximate low-rank structure and the correlation of input matrices. We prove COD is space optimal with respect to our improved error bound. We also propose a variant of COD for sparse matrices with theoretical guarantees. The experiments on real-world sparse datasets show that the proposed algorithm is more efficient than baseline methods.
△ Less
Submitted 17 December, 2020; v1 submitted 5 September, 2020;
originally announced September 2020.
-
Decomposition of the Total Effect for Two Mediators: A Natural Counterfactual Interaction Effect Framework
Authors:
Xin Gao,
Li Li,
Li Luo
Abstract:
Mediation analysis has been used in many disciplines to explain the mechanism or process that underlies an observed relationship between an exposure variable and an outcome variable via the inclusion of mediators. Decompositions of the total causal effect of an exposure variable into effects characterizing mediation pathways and interactions have gained an increasing amount of interest in the last…
▽ More
Mediation analysis has been used in many disciplines to explain the mechanism or process that underlies an observed relationship between an exposure variable and an outcome variable via the inclusion of mediators. Decompositions of the total causal effect of an exposure variable into effects characterizing mediation pathways and interactions have gained an increasing amount of interest in the last decade. In this work, we develop decompositions for scenarios where the two mediators are causally sequential or non-sequential. Current developments in this area have primarily focused on either decompositions without interaction components or with interactions but assuming no causally sequential order between the mediators. We propose a new concept called natural counterfactual interaction effect that captures the two-way and three-way interactions for both scenarios that extend the two-way mediated interactions in literature. We develop a unified approach for decomposing the total effect into the effects that are due to mediation only, interaction only, both mediation and interaction, neither mediation nor interaction within the counterfactual framework. Finally, we illustrate the proposed decomposition method using a real data analysis where the two mediators are causally sequential.
△ Less
Submitted 30 July, 2020;
originally announced July 2020.
-
METEOR: Learning Memory and Time Efficient Representations from Multi-modal Data Streams
Authors:
Amila Silva,
Shanika Karunasekera,
Christopher Leckie,
Ling Luo
Abstract:
Many learning tasks involve multi-modal data streams, where continuous data from different modes convey a comprehensive description about objects. A major challenge in this context is how to efficiently interpret multi-modal information in complex environments. This has motivated numerous studies on learning unsupervised representations from multi-modal data streams. These studies aim to understan…
▽ More
Many learning tasks involve multi-modal data streams, where continuous data from different modes convey a comprehensive description about objects. A major challenge in this context is how to efficiently interpret multi-modal information in complex environments. This has motivated numerous studies on learning unsupervised representations from multi-modal data streams. These studies aim to understand higher-level contextual information (e.g., a Twitter message) by jointly learning embeddings for the lower-level semantic units in different modalities (e.g., text, user, and location of a Twitter message). However, these methods directly associate each low-level semantic unit with a continuous embedding vector, which results in high memory requirements. Hence, deploying and continuously learning such models in low-memory devices (e.g., mobile devices) becomes a problem. To address this problem, we present METEOR, a novel MEmory and Time Efficient Online Representation learning technique, which: (1) learns compact representations for multi-modal data by sharing parameters within semantically meaningful groups and preserves the domain-agnostic semantics; (2) can be accelerated using parallel processes to accommodate different stream rates while capturing the temporal changes of the units; and (3) can be easily extended to capture implicit/explicit external knowledge related to multi-modal data streams. We evaluate METEOR using two types of multi-modal data streams (i.e., social media streams and shop** transaction streams) to demonstrate its ability to adapt to different domains. Our results show that METEOR preserves the quality of the representations while reducing memory usage by around 80% compared to the conventional memory-intensive embeddings.
△ Less
Submitted 23 July, 2020;
originally announced July 2020.
-
OMBA: User-Guided Product Representations for Online Market Basket Analysis
Authors:
Amila Silva,
Ling Luo,
Shanika Karunasekera,
Christopher Leckie
Abstract:
Market Basket Analysis (MBA) is a popular technique to identify associations between products, which is crucial for business decision making. Previous studies typically adopt conventional frequent itemset mining algorithms to perform MBA. However, they generally fail to uncover rarely occurring associations among the products at their most granular level. Also, they have limited ability to capture…
▽ More
Market Basket Analysis (MBA) is a popular technique to identify associations between products, which is crucial for business decision making. Previous studies typically adopt conventional frequent itemset mining algorithms to perform MBA. However, they generally fail to uncover rarely occurring associations among the products at their most granular level. Also, they have limited ability to capture temporal dynamics in associations between products. Hence, we propose OMBA, a novel representation learning technique for Online Market Basket Analysis. OMBA jointly learns representations for products and users such that they preserve the temporal dynamics of product-to-product and user-to-product associations. Subsequently, OMBA proposes a scalable yet effective online method to generate products' associations using their representations. Our extensive experiments on three real-world datasets show that OMBA outperforms state-of-the-art methods by as much as 21%, while emphasizing rarely occurring strong associations and effectively capturing temporal changes in associations.
△ Less
Submitted 16 February, 2021; v1 submitted 18 June, 2020;
originally announced June 2020.
-
Multi-consensus Decentralized Accelerated Gradient Descent
Authors:
Haishan Ye,
Luo Luo,
Ziang Zhou,
Tong Zhang
Abstract:
This paper considers the decentralized convex optimization problem, which has a wide range of applications in large-scale machine learning, sensor networks, and control theory. We propose novel algorithms that achieve optimal computation complexity and near optimal communication complexity. Our theoretical results give affirmative answers to the open problem on whether there exists an algorithm th…
▽ More
This paper considers the decentralized convex optimization problem, which has a wide range of applications in large-scale machine learning, sensor networks, and control theory. We propose novel algorithms that achieve optimal computation complexity and near optimal communication complexity. Our theoretical results give affirmative answers to the open problem on whether there exists an algorithm that can achieve a communication complexity (nearly) matching the lower bound depending on the global condition number instead of the local one. Furthermore, the linear convergence of our algorithms only depends on the strong convexity of global objective and it does \emph{not} require the local functions to be convex. The design of our methods relies on a novel integration of well-known techniques including Nesterov's acceleration, multi-consensus and gradient-tracking. Empirical studies show the outperformance of our methods for machine learning applications.
△ Less
Submitted 10 October, 2023; v1 submitted 2 May, 2020;
originally announced May 2020.
-
Decomposition of Total Effect with the Notion of Natural Counterfactual Interaction Effect
Authors:
Xin Gao,
Li Li,
Li Luo
Abstract:
Mediation analysis serves as a crucial tool to obtain causal inference based on directed acyclic graphs, which has been widely employed in the areas of biomedical science, social science, epidemiology and psychology. Decomposition of total effect provides a deep insight to fully understand the casual contribution from each path and interaction term. Since the four-way decomposition method was prop…
▽ More
Mediation analysis serves as a crucial tool to obtain causal inference based on directed acyclic graphs, which has been widely employed in the areas of biomedical science, social science, epidemiology and psychology. Decomposition of total effect provides a deep insight to fully understand the casual contribution from each path and interaction term. Since the four-way decomposition method was proposed to identify the mediated interaction effect in counterfactual framework, the idea had been extended to a more sophisticated scenario with non-sequential multiple mediators. However, the method exhibits limitations as the causal structure contains direct causal edges between mediators, such as inappropriate modeling of dependence and non-identifiability. We develop the notion of natural counterfactual interaction effect and find that the decomposition of total effect can be consistently realized with our proposed notion. Furthermore, natural counterfactual interaction effect overcomes the drawbacks and possesses a clear and significant interpretation, which may largely improve the capacity of researchers to analyze highly complex causal structures.
△ Less
Submitted 13 April, 2020;
originally announced April 2020.
-
From scenario-based seismic hazard to scenario-based landslide hazard: rewinding to the past via statistical simulations
Authors:
Luguang Luo,
Luigi Lombardo,
Cees van Westen,
Xiangjun Pei,
Runqiu Huang
Abstract:
The vast majority of landslide susceptibility studies assumes the slope instability process to be time-invariant under the definition that "the past and present are keys to the future". This assumption may generally be valid. However, the trigger, be it a rainfall or an earthquake event, clearly varies over time. And yet, the temporal component of the trigger is rarely included in landslide suscep…
▽ More
The vast majority of landslide susceptibility studies assumes the slope instability process to be time-invariant under the definition that "the past and present are keys to the future". This assumption may generally be valid. However, the trigger, be it a rainfall or an earthquake event, clearly varies over time. And yet, the temporal component of the trigger is rarely included in landslide susceptibility studies and only confined to hazard assessment. In this work, we investigate a population of landslides triggered in response to the 2017 Jiuzhaigou earthquake ($M_w = 6.5$) including the associated ground motion in the analyses, these being carried out at the Slope Unit (SU) level. We do this by implementing a Bayesian version of a Generalized Additive Model and assuming that the slope instability across the SUs in the study area behaves according to a Bernoulli probability distribution. This procedure would generally produce a susceptibility map reflecting the spatial pattern of the specific trigger and therefore of limited use for land use planning. However, we implement this first analytical step to reliably estimate the ground motion effect, and its distribution, on unstable SUs. We then assume the effect of the ground motion to be time-invariant, enabling statistical simulations for any ground motion scenario that occurred in the area from 1933 to 2017. As a result, we obtain the full spectrum of potential susceptibility patterns over the last century and compress this information into a susceptibility model/map representative of all the possible ground motion patterns since 1933. This backward statistical simulations can also be further exploited in the opposite direction where, by accounting for scenario-based ground motion, one can also use it in a forward direction to estimate future unstable slopes.
△ Less
Submitted 1 April, 2020;
originally announced April 2020.
-
Bayesian Nonparametric Space Partitions: A Survey
Authors:
Xuhui Fan,
Bin Li,
Ling Luo,
Scott A. Sisson
Abstract:
Bayesian nonparametric space partition (BNSP) models provide a variety of strategies for partitioning a $D$-dimensional space into a set of blocks. In this way, the data points lie in the same block would share certain kinds of homogeneity. BNSP models can be applied to various areas, such as regression/classification trees, random feature construction, relational modeling, etc. In this survey, we…
▽ More
Bayesian nonparametric space partition (BNSP) models provide a variety of strategies for partitioning a $D$-dimensional space into a set of blocks. In this way, the data points lie in the same block would share certain kinds of homogeneity. BNSP models can be applied to various areas, such as regression/classification trees, random feature construction, relational modeling, etc. In this survey, we investigate the current progress of BNSP research through the following three perspectives: models, which review various strategies for generating the partitions in the space and discuss their theoretical foundation `self-consistency'; applications, which cover the current mainstream usages of BNSP models and their potential future practises; and challenges, which identify the current unsolved problems and valuable future research topics. As there are no comprehensive reviews of BNSP literature before, we hope that this survey can induce further exploration and exploitation on this topic.
△ Less
Submitted 28 February, 2021; v1 submitted 26 February, 2020;
originally announced February 2020.
-
Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems
Authors:
Luo Luo,
Haishan Ye,
Zhichao Huang,
Tong Zhang
Abstract:
We consider nonconvex-concave minimax optimization problems of the form $\min_{\bf x}\max_{\bf y\in{\mathcal Y}} f({\bf x},{\bf y})$, where $f$ is strongly-concave in $\bf y$ but possibly nonconvex in $\bf x$ and ${\mathcal Y}$ is a convex and compact set. We focus on the stochastic setting, where we can only access an unbiased stochastic gradient estimate of $f$ at each iteration. This formulatio…
▽ More
We consider nonconvex-concave minimax optimization problems of the form $\min_{\bf x}\max_{\bf y\in{\mathcal Y}} f({\bf x},{\bf y})$, where $f$ is strongly-concave in $\bf y$ but possibly nonconvex in $\bf x$ and ${\mathcal Y}$ is a convex and compact set. We focus on the stochastic setting, where we can only access an unbiased stochastic gradient estimate of $f$ at each iteration. This formulation includes many machine learning applications as special cases such as robust optimization and adversary training. We are interested in finding an ${\mathcal O}(\varepsilon)$-stationary point of the function $Φ(\cdot)=\max_{\bf y\in{\mathcal Y}} f(\cdot, {\bf y})$. The most popular algorithm to solve this problem is stochastic gradient decent ascent, which requires $\mathcal O(κ^3\varepsilon^{-4})$ stochastic gradient evaluations, where $κ$ is the condition number. In this paper, we propose a novel method called Stochastic Recursive gradiEnt Descent Ascent (SREDA), which estimates gradients more efficiently using variance reduction. This method achieves the best known stochastic gradient complexity of ${\mathcal O}(κ^3\varepsilon^{-3})$, and its dependency on $\varepsilon$ is optimal for this problem.
△ Less
Submitted 23 October, 2020; v1 submitted 11 January, 2020;
originally announced January 2020.
-
USTAR: Online Multimodal Embedding for Modeling User-Guided Spatiotemporal Activity
Authors:
Amila Silva,
Shanika Karunasekera,
Christopher Leckie,
Ling Luo
Abstract:
Building spatiotemporal activity models for people's activities in urban spaces is important for understanding the ever-increasing complexity of urban dynamics. With the emergence of Geo-Tagged Social Media (GTSM) records, previous studies demonstrate the potential of GTSM records for spatiotemporal activity modeling. State-of-the-art methods for this task embed different modalities (location, tim…
▽ More
Building spatiotemporal activity models for people's activities in urban spaces is important for understanding the ever-increasing complexity of urban dynamics. With the emergence of Geo-Tagged Social Media (GTSM) records, previous studies demonstrate the potential of GTSM records for spatiotemporal activity modeling. State-of-the-art methods for this task embed different modalities (location, time, and text) of GTSM records into a single embedding space. However, they ignore Non-GeoTagged Social Media (NGTSM) records, which generally account for the majority of posts (e.g., more than 95\% in Twitter), and could represent a great source of information to alleviate the sparsity of GTSM records. Furthermore, in the current spatiotemporal embedding techniques, less focus has been given to the users, who exhibit spatially motivated behaviors. To bridge this research gap, this work proposes USTAR, a novel online learning method for User-guided SpatioTemporal Activity Representation, which (1) embeds locations, time, and text along with users into the same embedding space to capture their correlations; (2) uses a novel collaborative filtering approach based on two different empirically studied user behaviors to incorporate both NGTSM and GTSM records in learning; and (3) introduces a novel sampling technique to learn spatiotemporal representations in an online fashion to accommodate recent information into the embedding space, while avoiding overfitting to recent records and frequently appearing units in social media streams. Our results show that USTAR substantially improves the state-of-the-art for region retrieval and keyword retrieval and its potential to be applied to other downstream applications such as local event detection.
△ Less
Submitted 23 October, 2019;
originally announced October 2019.
-
A Stochastic Proximal Point Algorithm for Saddle-Point Problems
Authors:
Luo Luo,
Cheng Chen,
Yujun Li,
Guangzeng Xie,
Zhihua Zhang
Abstract:
We consider saddle point problems which objective functions are the average of $n$ strongly convex-concave individual components. Recently, researchers exploit variance reduction methods to solve such problems and achieve linear-convergence guarantees. However, these methods have a slow convergence when the condition number of the problem is very large. In this paper, we propose a stochastic proxi…
▽ More
We consider saddle point problems which objective functions are the average of $n$ strongly convex-concave individual components. Recently, researchers exploit variance reduction methods to solve such problems and achieve linear-convergence guarantees. However, these methods have a slow convergence when the condition number of the problem is very large. In this paper, we propose a stochastic proximal point algorithm, which accelerates the variance reduction method SAGA for saddle point problems. Compared with the catalyst framework, our algorithm reduces a logarithmic term of condition number for the iteration complexity. We adopt our algorithm to policy evaluation and the empirical results show that our method is much more efficient than state-of-the-art methods.
△ Less
Submitted 13 September, 2019;
originally announced September 2019.
-
A General Analysis Framework of Lower Complexity Bounds for Finite-Sum Optimization
Authors:
Guangzeng Xie,
Luo Luo,
Zhihua Zhang
Abstract:
This paper studies the lower bound complexity for the optimization problem whose objective function is the average of $n$ individual smooth convex functions. We consider the algorithm which gets access to gradient and proximal oracle for each individual component. For the strongly-convex case, we prove such an algorithm can not reach an $\varepsilon$-suboptimal point in fewer than…
▽ More
This paper studies the lower bound complexity for the optimization problem whose objective function is the average of $n$ individual smooth convex functions. We consider the algorithm which gets access to gradient and proximal oracle for each individual component. For the strongly-convex case, we prove such an algorithm can not reach an $\varepsilon$-suboptimal point in fewer than $Ω((n+\sqrt{κn})\log(1/\varepsilon))$ iterations, where $κ$ is the condition number of the objective function. This lower bound is tighter than previous results and perfectly matches the upper bound of the existing proximal incremental first-order oracle algorithm Point-SAGA. We develop a novel construction to show the above result, which partitions the tridiagonal matrix of classical examples into $n$ groups. This construction is friendly to the analysis of proximal oracle and also could be used to general convex and average smooth cases naturally.
△ Less
Submitted 22 August, 2019;
originally announced August 2019.
-
Task-Assisted Domain Adaptation with Anchor Tasks
Authors:
Zhizhong Li,
Linjie Luo,
Sergey Tulyakov,
Qieyun Dai,
Derek Hoiem
Abstract:
Some tasks, such as surface normals or single-view depth estimation, require per-pixel ground truth that is difficult to obtain on real images but easy to obtain on synthetic. However, models learned on synthetic images often do not generalize well to real images due to the domain shift. Our key idea to improve domain adaptation is to introduce a separate anchor task (such as facial landmarks) who…
▽ More
Some tasks, such as surface normals or single-view depth estimation, require per-pixel ground truth that is difficult to obtain on real images but easy to obtain on synthetic. However, models learned on synthetic images often do not generalize well to real images due to the domain shift. Our key idea to improve domain adaptation is to introduce a separate anchor task (such as facial landmarks) whose annotations can be obtained at no cost or are already available on both synthetic and real datasets. To further leverage the implicit relationship between the anchor and main tasks, we apply our \freeze technique that learns the cross-task guidance on the source domain with the final network layers, and use it on the target domain. We evaluate our methods on surface normal estimation on two pairs of datasets (indoor scenes and faces) with two kinds of anchor tasks (semantic segmentation and facial landmarks). We show that blindly applying domain adaptation or training the auxiliary task on only one domain may hurt performance, while using anchor tasks on both domains is better behaved. Our \freeze technique outperforms competing approaches, reaching performance in facial images on par with a recently popular surface normal estimation method using shape from shading domain knowledge.
△ Less
Submitted 9 November, 2020; v1 submitted 16 August, 2019;
originally announced August 2019.
-
The Age-Period-Cohort-Interaction Model for Describing and Investigating Inter-Cohort Deviations and Intra-Cohort Life-Course Dynamics
Authors:
Liying Luo,
James Hodges
Abstract:
Social scientists have frequently sought to understand the distinct effects of age, period, and cohort, but disaggregation of the three dimensions is difficult because cohort = period - age. We argue that this technical difficulty reflects a disconnection between how cohort effect is conceptualized and how it is modeled in the traditional age-period-cohort framework. We propose a new method, calle…
▽ More
Social scientists have frequently sought to understand the distinct effects of age, period, and cohort, but disaggregation of the three dimensions is difficult because cohort = period - age. We argue that this technical difficulty reflects a disconnection between how cohort effect is conceptualized and how it is modeled in the traditional age-period-cohort framework. We propose a new method, called the age-period-cohort-interaction (APC-I) model, that is qualitatively different from previous methods in that it represents Ryder's (1965) theoretical account about the conditions under which cohort differentiation may arise. This APC-I model does not require problematic statistical assumptions and the interpretation is straightforward. It quantifies inter-cohort deviations from the age and period main effects and also permits hypothesis testing about intra-cohort life-course dynamics. We demonstrate how this new model can be used to examine age, period, and cohort patterns in women's labor force participation.
△ Less
Submitted 2 June, 2019;
originally announced June 2019.
-
Constraints in Random Effects Age-Period-Cohort Models
Authors:
Liying Luo,
James S. Hodges
Abstract:
Random effects (RE) models have been widely used to study the contextual effects of structures such as neighborhood or school. The RE approach has recently been applied to age-period-cohort (APC) models that are unidentified because the predictors are exactly linearly dependent. However, it has not been fully understood how the RE specification identifies these otherwise unidentified APC models. W…
▽ More
Random effects (RE) models have been widely used to study the contextual effects of structures such as neighborhood or school. The RE approach has recently been applied to age-period-cohort (APC) models that are unidentified because the predictors are exactly linearly dependent. However, it has not been fully understood how the RE specification identifies these otherwise unidentified APC models. We address this challenge by first making explicit that RE-APC models have greater -- not less -- rank deficiency than the traditional fixed-effects model, followed by two empirical examples. We then provide intuition and a mathematical proof to explain that for APC models with one RE, treating one effect as an RE is equivalent to constraining the estimates of that effect's linear component and the random intercept to be zero. For APC models with two RE's, the effective constraints implied by the model depend on the true (i.e., in the data-generating mechanism) non-linear components of the effects that are modeled as RE's, so that the estimated linear components of the RE's are determined by the true non-linear components of those effects. In conclusion, RE-APC models impose arbitrary though highly obscure constraints and thus do not differ qualitatively from other constrained APC estimators.
△ Less
Submitted 16 April, 2019;
originally announced April 2019.
-
Adaptive Gradient Methods with Dynamic Bound of Learning Rate
Authors:
Liangchen Luo,
Yuanhao Xiong,
Yan Liu,
Xu Sun
Abstract:
Adaptive optimization methods such as AdaGrad, RMSprop and Adam have been proposed to achieve a rapid training process with an element-wise scaling term on learning rates. Though prevailing, they are observed to generalize poorly compared with SGD or even fail to converge due to unstable and extreme learning rates. Recent work has put forward some algorithms such as AMSGrad to tackle this issue bu…
▽ More
Adaptive optimization methods such as AdaGrad, RMSprop and Adam have been proposed to achieve a rapid training process with an element-wise scaling term on learning rates. Though prevailing, they are observed to generalize poorly compared with SGD or even fail to converge due to unstable and extreme learning rates. Recent work has put forward some algorithms such as AMSGrad to tackle this issue but they failed to achieve considerable improvement over existing methods. In our paper, we demonstrate that extreme learning rates can lead to poor performance. We provide new variants of Adam and AMSGrad, called AdaBound and AMSBound respectively, which employ dynamic bounds on learning rates to achieve a gradual and smooth transition from adaptive methods to SGD and give a theoretical proof of convergence. We further conduct experiments on various popular tasks and models, which is often insufficient in previous work. Experimental results show that new variants can eliminate the generalization gap between adaptive methods and SGD and maintain higher learning speed early in training at the same time. Moreover, they can bring significant improvement over their prototypes, especially on complex deep networks. The implementation of the algorithm can be found at https://github.com/Luolc/AdaBound .
△ Less
Submitted 26 February, 2019;
originally announced February 2019.
-
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge
Authors:
Spyridon Bakas,
Mauricio Reyes,
Andras Jakab,
Stefan Bauer,
Markus Rempfler,
Alessandro Crimi,
Russell Takeshi Shinohara,
Christoph Berger,
Sung Min Ha,
Martin Rozycki,
Marcel Prastawa,
Esther Alberts,
Jana Lipkova,
John Freymann,
Justin Kirby,
Michel Bilello,
Hassan Fathallah-Shaykh,
Roland Wiest,
Jan Kirschke,
Benedikt Wiestler,
Rivka Colen,
Aikaterini Kotrotsou,
Pamela Lamontagne,
Daniel Marcus,
Mikhail Milchenko
, et al. (402 additional authors not shown)
Abstract:
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles dissem…
▽ More
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.
△ Less
Submitted 23 April, 2019; v1 submitted 5 November, 2018;
originally announced November 2018.
-
Close Yet Distinctive Domain Adaptation
Authors:
Lingkun Luo,
Xiaofang Wang,
Shiqiang Hu,
Chao Wang,
Yuxing Tang,
Liming Chen
Abstract:
Domain adaptation is transfer learning which aims to generalize a learning model across training and testing data with different distributions. Most previous research tackle this problem in seeking a shared feature representation between source and target domains while reducing the mismatch of their data distributions. In this paper, we propose a close yet discriminative domain adaptation method,…
▽ More
Domain adaptation is transfer learning which aims to generalize a learning model across training and testing data with different distributions. Most previous research tackle this problem in seeking a shared feature representation between source and target domains while reducing the mismatch of their data distributions. In this paper, we propose a close yet discriminative domain adaptation method, namely CDDA, which generates a latent feature representation with two interesting properties. First, the discrepancy between the source and target domain, measured in terms of both marginal and conditional probability distribution via Maximum Mean Discrepancy is minimized so as to attract two domains close to each other. More importantly, we also design a repulsive force term, which maximizes the distances between each label dependent sub-domain to all others so as to drag different class dependent sub-domains far away from each other and thereby increase the discriminative power of the adapted domain. Moreover, given the fact that the underlying data manifold could have complex geometric structure, we further propose the constraints of label smoothness and geometric structure consistency for label propagation. Extensive experiments are conducted on 36 cross-domain image classification tasks over four public datasets. The comprehensive results show that the proposed method consistently outperforms the state-of-the-art methods with significant margins.
△ Less
Submitted 13 April, 2017;
originally announced April 2017.
-
Communication Lower Bounds for Distributed Convex Optimization: Partition Data on Features
Authors:
Zihao Chen,
Luo Luo,
Zhihua Zhang
Abstract:
Recently, there has been an increasing interest in designing distributed convex optimization algorithms under the setting where the data matrix is partitioned on features. Algorithms under this setting sometimes have many advantages over those under the setting where data is partitioned on samples, especially when the number of features is huge. Therefore, it is important to understand the inheren…
▽ More
Recently, there has been an increasing interest in designing distributed convex optimization algorithms under the setting where the data matrix is partitioned on features. Algorithms under this setting sometimes have many advantages over those under the setting where data is partitioned on samples, especially when the number of features is huge. Therefore, it is important to understand the inherent limitations of these optimization problems. In this paper, with certain restrictions on the communication allowed in the procedures, we develop tight lower bounds on communication rounds for a broad class of non-incremental algorithms under this setting. We also provide a lower bound on communication rounds for a class of (randomized) incremental algorithms.
△ Less
Submitted 2 December, 2016;
originally announced December 2016.
-
A Proximal Stochastic Quasi-Newton Algorithm
Authors:
Luo Luo,
Zihao Chen,
Zhihua Zhang,
Wu-Jun Li
Abstract:
In this paper, we discuss the problem of minimizing the sum of two convex functions: a smooth function plus a non-smooth function. Further, the smooth part can be expressed by the average of a large number of smooth component functions, and the non-smooth part is equipped with a simple proximal map**. We propose a proximal stochastic second-order method, which is efficient and scalable. It incor…
▽ More
In this paper, we discuss the problem of minimizing the sum of two convex functions: a smooth function plus a non-smooth function. Further, the smooth part can be expressed by the average of a large number of smooth component functions, and the non-smooth part is equipped with a simple proximal map**. We propose a proximal stochastic second-order method, which is efficient and scalable. It incorporates the Hessian in the smooth part of the function and exploits multistage scheme to reduce the variance of the stochastic gradient. We prove that our method can achieve linear rate of convergence.
△ Less
Submitted 16 November, 2016; v1 submitted 31 January, 2016;
originally announced February 2016.
-
A theoretical foundation of the target-decoy search strategy for false discovery rate control in proteomics
Authors:
Kun He,
Yan Fu,
Wen-Feng Zeng,
Lan Luo,
Hao Chi,
Chao Liu,
Lai-Yun Qing,
Rui-Xiang Sun,
Si-Min He
Abstract:
Motivation: Target-decoy search (TDS) is currently the most popular strategy for estimating and controlling the false discovery rate (FDR) of peptide identifications in mass spectrometry-based shotgun proteomics. While this strategy is very useful in practice and has been intensively studied empirically, its theoretical foundation has not yet been well established. Result: In this work, we systema…
▽ More
Motivation: Target-decoy search (TDS) is currently the most popular strategy for estimating and controlling the false discovery rate (FDR) of peptide identifications in mass spectrometry-based shotgun proteomics. While this strategy is very useful in practice and has been intensively studied empirically, its theoretical foundation has not yet been well established. Result: In this work, we systematically analyze the TDS strategy in a rigorous statistical sense. We prove that the commonly used concatenated TDS provides a conservative estimate of the FDR for any given score threshold, but it cannot rigorously control the FDR. We prove that with a slight modification to the commonly used formula for FDR estimation, the peptide-level FDR can be rigorously controlled based on the concatenated TDS. We show that the spectrum-level FDR control is difficult. We verify the theoretical conclusions with real mass spectrometry data.
△ Less
Submitted 3 January, 2015;
originally announced January 2015.