-
Contrastive independent component analysis
Authors:
Kexin Wang,
Aida Maraj,
Anna Seigal
Abstract:
Visualizing data and finding patterns in data are ubiquitous problems in the sciences. Increasingly, applications seek signal and structure in a contrastive setting: a foreground dataset relative to a background dataset. For this purpose, we propose contrastive independent component analysis (cICA). This generalizes independent component analysis to independent latent variables across a foreground…
▽ More
Visualizing data and finding patterns in data are ubiquitous problems in the sciences. Increasingly, applications seek signal and structure in a contrastive setting: a foreground dataset relative to a background dataset. For this purpose, we propose contrastive independent component analysis (cICA). This generalizes independent component analysis to independent latent variables across a foreground and background. We propose a hierarchical tensor decomposition algorithm for cICA. We study the identifiability of cICA and demonstrate its performance visualizing data and finding patterns in data, using synthetic and real-world datasets, comparing the approach to existing contrastive methods.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
CLEAR: Can Language Models Really Understand Causal Graphs?
Authors:
Sirui Chen,
Mengying Xu,
Kun Wang,
Xingyu Zeng,
Rui Zhao,
Shengjie Zhao,
Chaochao Lu
Abstract:
Causal reasoning is a cornerstone of how humans interpret the world. To model and reason about causality, causal graphs offer a concise yet effective solution. Given the impressive advancements in language models, a crucial question arises: can they really understand causal graphs? To this end, we pioneer an investigation into language models' understanding of causal graphs. Specifically, we devel…
▽ More
Causal reasoning is a cornerstone of how humans interpret the world. To model and reason about causality, causal graphs offer a concise yet effective solution. Given the impressive advancements in language models, a crucial question arises: can they really understand causal graphs? To this end, we pioneer an investigation into language models' understanding of causal graphs. Specifically, we develop a framework to define causal graph understanding, by assessing language models' behaviors through four practical criteria derived from diverse disciplines (e.g., philosophy and psychology). We then develop CLEAR, a novel benchmark that defines three complexity levels and encompasses 20 causal graph-based tasks across these levels. Finally, based on our framework and benchmark, we conduct extensive experiments on six leading language models and summarize five empirical findings. Our results indicate that while language models demonstrate a preliminary understanding of causal graphs, significant potential for improvement remains. Our project website is at https://github.com/OpenCausaLab/CLEAR.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Distribution-Free Predictive Inference under Unknown Temporal Drift
Authors:
Elise Han,
Chengpiao Huang,
Kaizheng Wang
Abstract:
Distribution-free prediction sets play a pivotal role in uncertainty quantification for complex statistical models. Their validity hinges on reliable calibration data, which may not be readily available as real-world environments often undergo unknown changes over time. In this paper, we propose a strategy for choosing an adaptive window and use the data therein to construct prediction sets. The w…
▽ More
Distribution-free prediction sets play a pivotal role in uncertainty quantification for complex statistical models. Their validity hinges on reliable calibration data, which may not be readily available as real-world environments often undergo unknown changes over time. In this paper, we propose a strategy for choosing an adaptive window and use the data therein to construct prediction sets. The window is selected by optimizing an estimated bias-variance tradeoff. We provide sharp coverage guarantees for our method, showing its adaptivity to the underlying temporal drift. We also illustrate its efficacy through numerical experiments on synthetic and real data.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Spatio-temporal Joint Analysis of PM2.5 and Ozone in California with INLA
Authors:
Jianan Pan,
Kunyang He,
Kai Wang,
Qing Mu,
Chengxiu Ling
Abstract:
The substantial threat of concurrent air pollutants to public health is increasingly severe under climate change. To identify the common drivers and extent of spatio-temporal similarity of PM2.5 and ozone, this paper proposed a log Gaussian-Gumbel Bayesian hierarchical model allowing for sharing a SPDE-AR(1) spatio-temporal interaction structure. The proposed model outperforms in terms of estimati…
▽ More
The substantial threat of concurrent air pollutants to public health is increasingly severe under climate change. To identify the common drivers and extent of spatio-temporal similarity of PM2.5 and ozone, this paper proposed a log Gaussian-Gumbel Bayesian hierarchical model allowing for sharing a SPDE-AR(1) spatio-temporal interaction structure. The proposed model outperforms in terms of estimation accuracy and prediction capacity for its increased parsimony and reduced uncertainty, especially for the shared ozone sub-model. Besides the consistently significant influence of temperature (positive), extreme drought (positive), fire burnt area (positive), and wind speed (negative) on both PM2.5 and ozone, surface pressure and GDP per capita (precipitation) demonstrate only positive associations with PM2.5 (ozone), while population density relates to neither. In addition, our results show the distinct spatio-temporal interactions and different seasonal patterns of PM2.5 and ozone, with peaks of PM2.5 and ozone in cold and hot seasons, respectively. Finally, with the aid of the excursion function, we see that the areas around the intersection of San Luis Obispo and Santa Barbara counties are likely to exceed the unhealthy ozone level for sensitive groups throughout the year. Our findings provide new insights for regional and seasonal strategies in the co-control of PM2.5 and ozone. Our methodology is expected to be utilized when interest lies in multiple interrelated processes in the fields of environment and epidemiology.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes
Authors:
Andrew Bennett,
Nathan Kallus,
Miruna Oprescu,
Wen Sun,
Kaiwen Wang
Abstract:
We study evaluating a policy under best- and worst-case perturbations to a Markov decision process (MDP), given transition observations from the original MDP, whether under the same or different policy. This is an important problem when there is the possibility of a shift between historical and future environments, due to e.g. unmeasured confounding, distributional shift, or an adversarial environ…
▽ More
We study evaluating a policy under best- and worst-case perturbations to a Markov decision process (MDP), given transition observations from the original MDP, whether under the same or different policy. This is an important problem when there is the possibility of a shift between historical and future environments, due to e.g. unmeasured confounding, distributional shift, or an adversarial environment. We propose a perturbation model that can modify transition kernel densities up to a given multiplicative factor or its reciprocal, which extends the classic marginal sensitivity model (MSM) for single time step decision making to infinite-horizon RL. We characterize the sharp bounds on policy value under this model, that is, the tightest possible bounds given by the transition observations from the original MDP, and we study the estimation of these bounds from such transition observations. We develop an estimator with several appealing guarantees: it is semiparametrically efficient, and remains so even when certain necessary nuisance functions such as worst-case Q-functions are estimated at slow nonparametric rates. It is also asymptotically normal, enabling easy statistical inference using Wald confidence intervals. In addition, when certain nuisances are estimated inconsistently we still estimate a valid, albeit possibly not sharp bounds on the policy value. We validate these properties in numeric simulations. The combination of accounting for environment shifts from train to test (robustness), being insensitive to nuisance-function estimation (orthogonality), and accounting for having only finite samples to learn from (inference) together leads to credible and reliable policy evaluation.
△ Less
Submitted 29 March, 2024;
originally announced April 2024.
-
Analysis of singular subspaces under random perturbations
Authors:
Ke Wang
Abstract:
We present a comprehensive analysis of singular vector and singular subspace perturbations in the context of the signal plus random Gaussian noise matrix model. Assuming a low-rank signal matrix, we extend the Davis-Kahan-Wedin theorem in a fully generalized manner, applicable to any unitarily invariant matrix norm, extending previous results of O'Rourke, Vu and the author. We also obtain the fine…
▽ More
We present a comprehensive analysis of singular vector and singular subspace perturbations in the context of the signal plus random Gaussian noise matrix model. Assuming a low-rank signal matrix, we extend the Davis-Kahan-Wedin theorem in a fully generalized manner, applicable to any unitarily invariant matrix norm, extending previous results of O'Rourke, Vu and the author. We also obtain the fine-grained results, which encompass the $\ell_\infty$ analysis of singular vectors, the $\ell_{2, \infty}$ analysis of singular subspaces, as well as the exploration of linear and bilinear functions related to the singular vectors. Moreover, we explore the practical implications of these findings, in the context of the Gaussian mixture model and the submatrix localization problem.
△ Less
Submitted 19 March, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
Model Assessment and Selection under Temporal Distribution Shift
Authors:
Elise Han,
Chengpiao Huang,
Kaizheng Wang
Abstract:
We investigate model assessment and selection in a changing environment, by synthesizing datasets from both the current time period and historical epochs. To tackle unknown and potentially arbitrary temporal distribution shift, we develop an adaptive rolling window approach to estimate the generalization error of a given model. This strategy also facilitates the comparison between any two candidat…
▽ More
We investigate model assessment and selection in a changing environment, by synthesizing datasets from both the current time period and historical epochs. To tackle unknown and potentially arbitrary temporal distribution shift, we develop an adaptive rolling window approach to estimate the generalization error of a given model. This strategy also facilitates the comparison between any two candidate models by estimating the difference of their generalization errors. We further integrate pairwise comparisons into a single-elimination tournament, achieving near-optimal model selection from a collection of candidates. Theoretical analyses and numerical experiments demonstrate the adaptivity of our proposed methods to the non-stationarity in data.
△ Less
Submitted 3 June, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
Interpretable Mechanistic Representations for Meal-level Glycemic Control in the Wild
Authors:
Ke Alexander Wang,
Emily B. Fox
Abstract:
Diabetes encompasses a complex landscape of glycemic control that varies widely among individuals. However, current methods do not faithfully capture this variability at the meal level. On the one hand, expert-crafted features lack the flexibility of data-driven methods; on the other hand, learned representations tend to be uninterpretable which hampers clinical adoption. In this paper, we propose…
▽ More
Diabetes encompasses a complex landscape of glycemic control that varies widely among individuals. However, current methods do not faithfully capture this variability at the meal level. On the one hand, expert-crafted features lack the flexibility of data-driven methods; on the other hand, learned representations tend to be uninterpretable which hampers clinical adoption. In this paper, we propose a hybrid variational autoencoder to learn interpretable representations of CGM and meal data. Our method grounds the latent space to the inputs of a mechanistic differential equation, producing embeddings that reflect physiological quantities, such as insulin sensitivity, glucose effectiveness, and basal glucose levels. Moreover, we introduce a novel method to infer the glucose appearance rate, making the mechanistic model robust to unreliable meal logs. On a dataset of CGM and self-reported meals from individuals with type-2 diabetes and pre-diabetes, our unsupervised representation discovers a separation between individuals proportional to their disease severity. Our embeddings produce clusters that are up to 4x better than naive, expert, black-box, and pure mechanistic features. Our method provides a nuanced, yet interpretable, embedding space to compare glycemic control within and across individuals, directly learnable from in-the-wild data.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Multivariate Unified Skew-t Distributions And Their Properties
Authors:
Kesen Wang,
Maicon J. Karling,
Reinaldo B. Arellano-Valle,
Marc G. Genton
Abstract:
The unified skew-t (SUT) is a flexible parametric multivariate distribution that accounts for skewness and heavy tails in the data. A few of its properties can be found scattered in the literature or in a parameterization that does not follow the original one for unified skew-normal (SUN) distributions, yet a systematic study is lacking. In this work, explicit properties of the multivariate SUT di…
▽ More
The unified skew-t (SUT) is a flexible parametric multivariate distribution that accounts for skewness and heavy tails in the data. A few of its properties can be found scattered in the literature or in a parameterization that does not follow the original one for unified skew-normal (SUN) distributions, yet a systematic study is lacking. In this work, explicit properties of the multivariate SUT distribution are presented, such as its stochastic representations, moments, SUN-scale mixture representation, linear transformation, additivity, marginal distribution, canonical form, quadratic form, conditional distribution, change of latent dimensions, Mardia measures of multivariate skewness and kurtosis, and non-identifiability issue. These results are given in a parametrization that reduces to the original SUN distribution as a sub-model, hence facilitating the use of the SUT for applications. Several models based on the SUT distribution are provided for illustration.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Do we listen to what we are told? An empirical study on human behaviour during the COVID-19 pandemic: neural networks vs. regression analysis
Authors:
Yuxi Heluo,
Kexin Wang,
Charles W. Robson
Abstract:
In this work, we contribute the first visual open-source empirical study on human behaviour during the COVID-19 pandemic, in order to investigate how compliant a general population is to mask-wearing-related public-health policy. Object-detection-based convolutional neural networks, regression analysis and multilayer perceptrons are combined to analyse visual data of the Viennese public during 202…
▽ More
In this work, we contribute the first visual open-source empirical study on human behaviour during the COVID-19 pandemic, in order to investigate how compliant a general population is to mask-wearing-related public-health policy. Object-detection-based convolutional neural networks, regression analysis and multilayer perceptrons are combined to analyse visual data of the Viennese public during 2020. We find that mask-wearing-related government regulations and public-transport announcements encouraged correct mask-wearing-behaviours during the COVID-19 pandemic. Importantly, changes in announcement and regulation contents led to heterogeneous effects on people's behaviour. Comparing the predictive power of regression analysis and neural networks, we demonstrate that the latter produces more accurate predictions of population reactions during the COVID-19 pandemic. Our use of regression modelling also allows us to unearth possible causal pathways underlying societal behaviour. Since our findings highlight the importance of appropriate communication contents, our results will facilitate more effective non-pharmaceutical interventions to be developed in future. Adding to the literature, we demonstrate that regression modelling and neural networks are not mutually exclusive but instead complement each other.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
A Stability Principle for Learning under Non-Stationarity
Authors:
Chengpiao Huang,
Kaizheng Wang
Abstract:
We develop a versatile framework for statistical learning in non-stationary environments. In each time period, our approach applies a stability principle to select a look-back window that maximizes the utilization of historical data while kee** the cumulative bias within an acceptable range relative to the stochastic error. Our theory showcases the adaptability of this approach to unknown non-st…
▽ More
We develop a versatile framework for statistical learning in non-stationary environments. In each time period, our approach applies a stability principle to select a look-back window that maximizes the utilization of historical data while kee** the cumulative bias within an acceptable range relative to the stochastic error. Our theory showcases the adaptability of this approach to unknown non-stationarity. The regret bound is minimax optimal up to logarithmic factors when the population losses are strongly convex, or Lipschitz only. At the heart of our analysis lie two novel components: a measure of similarity between functions and a segmentation technique for dividing the non-stationary data sequence into quasi-stationary pieces.
△ Less
Submitted 22 January, 2024; v1 submitted 27 October, 2023;
originally announced October 2023.
-
Which Parameterization of the Matérn Covariance Function?
Authors:
Kesen Wang,
Sameh Abdulah,
Ying Sun,
Marc G. Genton
Abstract:
The Matérn family of covariance functions is currently the most popularly used model in spatial statistics, geostatistics, and machine learning to specify the correlation between two geographical locations based on spatial distance. Compared to existing covariance functions, the Matérn family has more flexibility in data fitting because it allows the control of the field smoothness through a dedic…
▽ More
The Matérn family of covariance functions is currently the most popularly used model in spatial statistics, geostatistics, and machine learning to specify the correlation between two geographical locations based on spatial distance. Compared to existing covariance functions, the Matérn family has more flexibility in data fitting because it allows the control of the field smoothness through a dedicated parameter. Moreover, it generalizes other popular covariance functions. However, fitting the smoothness parameter is computationally challenging since it complicates the optimization process. As a result, some practitioners set the smoothness parameter at an arbitrary value to reduce the optimization convergence time. In the literature, studies have used various parameterizations of the Matérn covariance function, assuming they are equivalent. This work aims at studying the effectiveness of different parameterizations under various settings. We demonstrate the feasibility of inferring all parameters simultaneously and quantifying their uncertainties on large-scale data using the ExaGeoStat parallel software. We also highlight the importance of the smoothness parameter by analyzing the Fisher information of the statistical parameters. We show that the various parameterizations have different properties and differ from several perspectives. In particular, we study the three most popular parameterizations in terms of parameter estimation accuracy, modeling accuracy and efficiency, prediction efficiency, uncertainty quantification, and asymptotic properties. We further demonstrate their differing performances under nugget effects and approximated covariance. Lastly, we give recommendations for parameterization selection based on our experimental results.
△ Less
Submitted 28 August, 2023;
originally announced September 2023.
-
Computing SHAP Efficiently Using Model Structure Information
Authors:
Linwei Hu,
Ke Wang
Abstract:
SHAP (SHapley Additive exPlanations) has become a popular method to attribute the prediction of a machine learning model on an input to its features. One main challenge of SHAP is the computation time. An exact computation of Shapley values requires exponential time complexity. Therefore, many approximation methods are proposed in the literature. In this paper, we propose methods that can compute…
▽ More
SHAP (SHapley Additive exPlanations) has become a popular method to attribute the prediction of a machine learning model on an input to its features. One main challenge of SHAP is the computation time. An exact computation of Shapley values requires exponential time complexity. Therefore, many approximation methods are proposed in the literature. In this paper, we propose methods that can compute SHAP exactly in polynomial time or even faster for SHAP definitions that satisfy our additivity and dummy assumptions (eg, kernal SHAP and baseline SHAP). We develop different strategies for models with different levels of model structure information: known functional decomposition, known order of model (defined as highest order of interaction in the model), or unknown order. For the first case, we demonstrate an additive property and a way to compute SHAP from the lower-order functional components. For the second case, we derive formulas that can compute SHAP in polynomial time. Both methods yield exact SHAP results. Finally, if even the order of model is unknown, we propose an iterative way to approximate Shapley values. The three methods we propose are computationally efficient when the order of model is not high which is typically the case in practice. We compare with sampling approach proposed in Castor & Gomez (2008) using simulation studies to demonstrate the efficacy of our proposed methods.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Random-Set Convolutional Neural Network (RS-CNN) for Epistemic Deep Learning
Authors:
Shireen Kudukkil Manchingal,
Muhammad Mubashar,
Kaizheng Wang,
Keivan Shariatmadar,
Fabio Cuzzolin
Abstract:
Machine learning is increasingly deployed in safety-critical domains where robustness against adversarial attacks is crucial and erroneous predictions could lead to potentially catastrophic consequences. This highlights the need for learning systems to be equipped with the means to determine a model's confidence in its prediction and the epistemic uncertainty associated with it, 'to know when a mo…
▽ More
Machine learning is increasingly deployed in safety-critical domains where robustness against adversarial attacks is crucial and erroneous predictions could lead to potentially catastrophic consequences. This highlights the need for learning systems to be equipped with the means to determine a model's confidence in its prediction and the epistemic uncertainty associated with it, 'to know when a model does not know'. In this paper, we propose a novel Random-Set Convolutional Neural Network (RS-CNN) for classification which predicts belief functions rather than probability vectors over the set of classes, using the mathematics of random sets, i.e., distributions over the power set of the sample space. Based on the epistemic deep learning approach, random-set models are capable of representing the 'epistemic' uncertainty induced in machine learning by limited training sets. We estimate epistemic uncertainty by approximating the size of credal sets associated with the predicted belief functions, and experimentally demonstrate how our approach outperforms competing uncertainty-aware approaches in a classical evaluation setting. The performance of RS-CNN is best demonstrated on OOD samples where it manages to capture the true prediction while standard CNNs fail.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
On Regularization and Inference with Label Constraints
Authors:
Kaifu Wang,
Hangfeng He,
Tin D. Nguyen,
Piyush Kumar,
Dan Roth
Abstract:
Prior knowledge and symbolic rules in machine learning are often expressed in the form of label constraints, especially in structured prediction problems. In this work, we compare two common strategies for encoding label constraints in a machine learning pipeline, regularization with constraints and constrained inference, by quantifying their impact on model performance. For regularization, we sho…
▽ More
Prior knowledge and symbolic rules in machine learning are often expressed in the form of label constraints, especially in structured prediction problems. In this work, we compare two common strategies for encoding label constraints in a machine learning pipeline, regularization with constraints and constrained inference, by quantifying their impact on model performance. For regularization, we show that it narrows the generalization gap by precluding models that are inconsistent with the constraints. However, its preference for small violations introduces a bias toward a suboptimal model. For constrained inference, we show that it reduces the population risk by correcting a model's violation, and hence turns the violation into an advantage. Given these differences, we further explore the use of two approaches together and propose conditions for constrained inference to compensate for the bias introduced by regularization, aiming to improve both the model complexity and optimal risk.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
A Transparent and Nonlinear Method for Variable Selection
Authors:
Keyao Wang,
Huiwen Wang,
Jichang Zhao,
Lihong Wang
Abstract:
Variable selection is a procedure to attain the truly important predictors from inputs. Complex nonlinear dependencies and strong coupling pose great challenges for variable selection in high-dimensional data. In addition, real-world applications have increased demands for interpretability of the selection process. A pragmatic approach should not only attain the most predictive covariates, but als…
▽ More
Variable selection is a procedure to attain the truly important predictors from inputs. Complex nonlinear dependencies and strong coupling pose great challenges for variable selection in high-dimensional data. In addition, real-world applications have increased demands for interpretability of the selection process. A pragmatic approach should not only attain the most predictive covariates, but also provide ample and easy-to-understand grounds for removing certain covariates. In view of these requirements, this paper puts forward an approach for transparent and nonlinear variable selection. In order to transparently decouple information within the input predictors, a three-step heuristic search is designed, via which the input predictors are grouped into four subsets: the relevant to be selected, and the uninformative, redundant, and conditionally independent to be removed. A nonlinear partial correlation coefficient is introduced to better identify the predictors which have nonlinear functional dependence with the response. The proposed method is model-free and the selected subset can be competent input for commonly used predictive models. Experiments demonstrate the superior performance of the proposed method against the state-of-the-art baselines in terms of prediction accuracy and model interpretability.
△ Less
Submitted 30 June, 2023;
originally announced July 2023.
-
On Learning Latent Models with Multi-Instance Weak Supervision
Authors:
Kaifu Wang,
Efi Tsamoura,
Dan Roth
Abstract:
We consider a weakly supervised learning scenario where the supervision signal is generated by a transition function $σ$ of labels associated with multiple input instances. We formulate this problem as \emph{multi-instance Partial Label Learning (multi-instance PLL)}, which is an extension to the standard PLL problem. Our problem is met in different fields, including latent structural learning and…
▽ More
We consider a weakly supervised learning scenario where the supervision signal is generated by a transition function $σ$ of labels associated with multiple input instances. We formulate this problem as \emph{multi-instance Partial Label Learning (multi-instance PLL)}, which is an extension to the standard PLL problem. Our problem is met in different fields, including latent structural learning and neuro-symbolic integration. Despite the existence of many learning techniques, limited theoretical analysis has been dedicated to this problem. In this paper, we provide the first theoretical study of multi-instance PLL with possibly an unknown transition $σ$. Our main contributions are as follows. Firstly, we propose a necessary and sufficient condition for the learnability of the problem. This condition non-trivially generalizes and relaxes the existing small ambiguity degree in the PLL literature, since we allow the transition to be deterministic. Secondly, we derive Rademacher-style error bounds based on a top-$k$ surrogate loss that is widely used in the neuro-symbolic literature. Furthermore, we conclude with empirical experiments for learning under unknown transitions. The empirical results align with our theoretical findings; however, they also expose the issue of scalability in the weak supervision literature.
△ Less
Submitted 23 June, 2023;
originally announced June 2023.
-
Bayesian Inference for Multivariate Monotone Densities
Authors:
Kang Wang,
Subhashis Ghosal
Abstract:
We consider a nonparametric Bayesian approach to estimation and testing for a multivariate monotone density. Instead of following the conventional Bayesian route of putting a prior distribution complying with the monotonicity restriction, we put a prior on the step heights through binning and a Dirichlet distribution. An arbitrary piece-wise constant probability density is converted to a monotone…
▽ More
We consider a nonparametric Bayesian approach to estimation and testing for a multivariate monotone density. Instead of following the conventional Bayesian route of putting a prior distribution complying with the monotonicity restriction, we put a prior on the step heights through binning and a Dirichlet distribution. An arbitrary piece-wise constant probability density is converted to a monotone one by a projection map, taking its $\mathbb{L}_1$-projection onto the space of monotone functions, which is subsequently normalized to integrate to one. We construct consistent Bayesian tests to test multivariate monotonicity of a probability density based on the $\mathbb{L}_1$-distance to the class of monotone functions. The test is shown to have a size going to zero and high power against alternatives sufficiently separated from the null hypothesis. To obtain a Bayesian credible interval for the value of the density function at an interior point with guaranteed asymptotic frequentist coverage, we consider a posterior quantile interval of an induced map transforming the function value to its value optimized over certain blocks. The limiting coverage is explicitly calculated and is seen to be higher than the credibility level used in the construction. By exploring the asymptotic relationship between the coverage and the credibility, we show that a desired asymptomatic coverage can be obtained exactly by starting with an appropriate credibility level.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
Bayesian Inference for $k$-Monotone Densities with Applications to Multiple Testing
Authors:
Kang Wang,
Subhashis Ghosal
Abstract:
Shape restriction, like monotonicity or convexity, imposed on a function of interest, such as a regression or density function, allows for its estimation without smoothness assumptions. The concept of $k$-monotonicity encompasses a family of shape restrictions, including decreasing and convex decreasing as special cases corresponding to $k=1$ and $k=2$. We consider Bayesian approaches to estimate…
▽ More
Shape restriction, like monotonicity or convexity, imposed on a function of interest, such as a regression or density function, allows for its estimation without smoothness assumptions. The concept of $k$-monotonicity encompasses a family of shape restrictions, including decreasing and convex decreasing as special cases corresponding to $k=1$ and $k=2$. We consider Bayesian approaches to estimate a $k$-monotone density. By utilizing a kernel mixture representation and putting a Dirichlet process or a finite mixture prior on the mixing distribution, we show that the posterior contraction rate in the Hellinger distance is $(n/\log n)^{- k/(2k + 1)}$ for a $k$-monotone density, which is minimax optimal up to a polylogarithmic factor. When the true $k$-monotone density is a finite $J_0$-component mixture of the kernel, the contraction rate improves to the nearly parametric rate $\sqrt{(J_0 \log n)/n}$. Moreover, by putting a prior on $k$, we show that the same rates hold even when the best value of $k$ is unknown. A specific application in modeling the density of $p$-values in a large-scale multiple testing problem is considered. Simulation studies are conducted to evaluate the performance of the proposed method.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning
Authors:
Kaiwen Wang,
Kevin Zhou,
Runzhe Wu,
Nathan Kallus,
Wen Sun
Abstract:
While distributional reinforcement learning (DistRL) has been empirically effective, the question of when and why it is better than vanilla, non-distributional RL has remained unanswered. This paper explains the benefits of DistRL through the lens of small-loss bounds, which are instance-dependent bounds that scale with optimal achievable cost. Particularly, our bounds converge much faster than th…
▽ More
While distributional reinforcement learning (DistRL) has been empirically effective, the question of when and why it is better than vanilla, non-distributional RL has remained unanswered. This paper explains the benefits of DistRL through the lens of small-loss bounds, which are instance-dependent bounds that scale with optimal achievable cost. Particularly, our bounds converge much faster than those from non-distributional approaches if the optimal cost is small. As warmup, we propose a distributional contextual bandit (DistCB) algorithm, which we show enjoys small-loss regret bounds and empirically outperforms the state-of-the-art on three real-world tasks. In online RL, we propose a DistRL algorithm that constructs confidence sets using maximum likelihood estimation. We prove that our algorithm enjoys novel small-loss PAC bounds in low-rank MDPs. As part of our analysis, we introduce the $\ell_1$ distributional eluder dimension which may be of independent interest. Then, in offline RL, we show that pessimistic DistRL enjoys small-loss PAC bounds that are novel to the offline setting and are more robust to bad single-policy coverage.
△ Less
Submitted 22 September, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Sequence Modeling with Multiresolution Convolutional Memory
Authors:
Jiaxin Shi,
Ke Alexander Wang,
Emily B. Fox
Abstract:
Efficiently capturing the long-range patterns in sequential data sources salient to a given task -- such as classification and generative modeling -- poses a fundamental challenge. Popular approaches in the space tradeoff between the memory burden of brute-force enumeration and comparison, as in transformers, the computational burden of complicated sequential dependencies, as in recurrent neural n…
▽ More
Efficiently capturing the long-range patterns in sequential data sources salient to a given task -- such as classification and generative modeling -- poses a fundamental challenge. Popular approaches in the space tradeoff between the memory burden of brute-force enumeration and comparison, as in transformers, the computational burden of complicated sequential dependencies, as in recurrent neural networks, or the parameter burden of convolutional networks with many or large filters. We instead take inspiration from wavelet-based multiresolution analysis to define a new building block for sequence modeling, which we call a MultiresLayer. The key component of our model is the multiresolution convolution, capturing multiscale trends in the input sequence. Our MultiresConv can be implemented with shared filters across a dilated causal convolution tree. Thus it garners the computational advantages of convolutional networks and the principled theoretical motivation of wavelet decompositions. Our MultiresLayer is straightforward to implement, requires significantly fewer parameters, and maintains at most a $\mathcal{O}(N\log N)$ memory footprint for a length $N$ sequence. Yet, by stacking such layers, our model yields state-of-the-art performance on a number of sequence classification and autoregressive density estimation tasks using CIFAR-10, ListOps, and PTB-XL datasets.
△ Less
Submitted 1 November, 2023; v1 submitted 2 May, 2023;
originally announced May 2023.
-
Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift
Authors:
Kaizheng Wang
Abstract:
We develop and analyze a principled approach to kernel ridge regression under covariate shift. The goal is to learn a regression function with small mean squared error over a target distribution, based on unlabeled data from there and labeled data that may have a different feature distribution. We propose to split the labeled data into two subsets and conduct kernel ridge regression on them separa…
▽ More
We develop and analyze a principled approach to kernel ridge regression under covariate shift. The goal is to learn a regression function with small mean squared error over a target distribution, based on unlabeled data from there and labeled data that may have a different feature distribution. We propose to split the labeled data into two subsets and conduct kernel ridge regression on them separately to obtain a collection of candidate models and an imputation model. We use the latter to fill the missing labels and then select the best candidate model accordingly. Our non-asymptotic excess risk bounds show that in quite general scenarios, our estimator adapts to the structure of the target distribution as well as the covariate shift. It achieves the minimax optimal error rate up to a logarithmic factor. The use of pseudo-labels in model selection does not have major negative impacts.
△ Less
Submitted 14 March, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
Disentangled Representation for Causal Mediation Analysis
Authors:
Ziqi Xu,
Debo Cheng,
Jiuyong Li,
Jixue Liu,
Lin Liu,
Ke Wang
Abstract:
Estimating direct and indirect causal effects from observational data is crucial to understanding the causal mechanisms and predicting the behaviour under different interventions. Causal mediation analysis is a method that is often used to reveal direct and indirect effects. Deep learning shows promise in mediation analysis, but the current methods only assume latent confounders that affect treatm…
▽ More
Estimating direct and indirect causal effects from observational data is crucial to understanding the causal mechanisms and predicting the behaviour under different interventions. Causal mediation analysis is a method that is often used to reveal direct and indirect effects. Deep learning shows promise in mediation analysis, but the current methods only assume latent confounders that affect treatment, mediator and outcome simultaneously, and fail to identify different types of latent confounders (e.g., confounders that only affect the mediator or outcome). Furthermore, current methods are based on the sequential ignorability assumption, which is not feasible for dealing with multiple types of latent confounders. This work aims to circumvent the sequential ignorability assumption and applies the piecemeal deconfounding assumption as an alternative. We propose the Disentangled Mediation Analysis Variational AutoEncoder (DMAVAE), which disentangles the representations of latent confounders into three types to accurately estimate the natural direct effect, natural indirect effect and total effect. Experimental results show that the proposed method outperforms existing methods and has strong generalisation ability. We further apply the method to a real-world dataset to show its potential application.
△ Less
Submitted 15 December, 2023; v1 submitted 19 February, 2023;
originally announced February 2023.
-
Spatio-temporal Joint Modelling on Moderate and Extreme Air Pollution in Spain
Authors:
Kai Wang,
Chengxiu Ling,
Ying Chen,
Zhengjun Zhang
Abstract:
Very unhealthy air quality is consistently connected with numerous diseases. Appropriate extreme analysis and accurate predictions are in rising demand for exploring potential linked causes and for providing suggestions for the environmental agency in public policy strategy. This paper aims to model the spatial and temporal pattern of both moderate and extremely poor PM10 concentrations (of daily…
▽ More
Very unhealthy air quality is consistently connected with numerous diseases. Appropriate extreme analysis and accurate predictions are in rising demand for exploring potential linked causes and for providing suggestions for the environmental agency in public policy strategy. This paper aims to model the spatial and temporal pattern of both moderate and extremely poor PM10 concentrations (of daily mean) collected from 342 representative monitors distributed throughout mainland Spain from 2017 to 2021. We firstly propose and compare a series of Bayesian hierarchical generalized extreme models of annual maxima PM10 concentrations, including both the fixed effect of altitude, temperature, precipitation, vapour pressure and population density, as well as the spatio-temporal random effect with the Stochastic Partial Differential Equation (SPDE) approach and a lag-one dynamic auto-regressive component (AR(1)). Under WAIC, DIC and other criteria, the best model is selected with good predictive ability based on the first four-year data (2017--2020) for training and the last-year data (2021) for testing. We bring the structure of the best model to establish the joint Bayesian model of annual mean and annual maxima PM10 concentrations and provide evidence that certain predictors (precipitation, vapour pressure and population density) influence comparably while the other predictors (altitude and temperature) impact reversely in the different scaled PM10 concentrations. The findings are applied to identify the hot-spot regions with poor air quality using excursion functions specified at the grid level. It suggests that the community of Madrid and some sites in northwestern and southern Spain are likely to be exposed to severe air pollution, simultaneously exceeding the warning risk threshold.
△ Less
Submitted 24 August, 2023; v1 submitted 12 February, 2023;
originally announced February 2023.
-
Efficient Fraud Detection Using Deep Boosting Decision Trees
Authors:
Biao Xu,
Yao Wang,
Xiuwu Liao,
Kaidong Wang
Abstract:
Fraud detection is to identify, monitor, and prevent potentially fraudulent activities from complex data. The recent development and success in AI, especially machine learning, provides a new data-driven way to deal with fraud. From a methodological point of view, machine learning based fraud detection can be divided into two categories, i.e., conventional methods (decision tree, boosting...) and…
▽ More
Fraud detection is to identify, monitor, and prevent potentially fraudulent activities from complex data. The recent development and success in AI, especially machine learning, provides a new data-driven way to deal with fraud. From a methodological point of view, machine learning based fraud detection can be divided into two categories, i.e., conventional methods (decision tree, boosting...) and deep learning, both of which have significant limitations in terms of the lack of representation learning ability for the former and interpretability for the latter. Furthermore, due to the rarity of detected fraud cases, the associated data is usually imbalanced, which seriously degrades the performance of classification algorithms. In this paper, we propose deep boosting decision trees (DBDT), a novel approach for fraud detection based on gradient boosting and neural networks. In order to combine the advantages of both conventional methods and deep learning, we first construct soft decision tree (SDT), a decision tree structured model with neural networks as its nodes, and then ensemble SDTs using the idea of gradient boosting. In this way we embed neural networks into gradient boosting to improve its representation learning capability and meanwhile maintain the interpretability. Furthermore, aiming at the rarity of detected fraud cases, in the model training phase we propose a compositional AUC maximization approach to deal with data imbalances at algorithm level. Extensive experiments on several real-life fraud detection datasets show that DBDT can significantly improve the performance and meanwhile maintain good interpretability. Our code is available at https://github.com/freshmanXB/DBDT.
△ Less
Submitted 18 May, 2023; v1 submitted 12 February, 2023;
originally announced February 2023.
-
Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR
Authors:
Kaiwen Wang,
Nathan Kallus,
Wen Sun
Abstract:
In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $τ$. Starting with multi-arm bandits (MABs), we show the minimax CVaR regret rate is $Ω(\sqrt{τ^{-1}AK})$, where $A$ is the number of actions and $K$ is the number of episodes, and that it is achieved by an Upper Confidence Bound algorithm with a nov…
▽ More
In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $τ$. Starting with multi-arm bandits (MABs), we show the minimax CVaR regret rate is $Ω(\sqrt{τ^{-1}AK})$, where $A$ is the number of actions and $K$ is the number of episodes, and that it is achieved by an Upper Confidence Bound algorithm with a novel Bernstein bonus. For online RL in tabular Markov Decision Processes (MDPs), we show a minimax regret lower bound of $Ω(\sqrt{τ^{-1}SAK})$ (with normalized cumulative rewards), where $S$ is the number of states, and we propose a novel bonus-driven Value Iteration procedure. We show that our algorithm achieves the optimal regret of $\widetilde O(\sqrt{τ^{-1}SAK})$ under a continuity assumption and in general attains a near-optimal regret of $\widetilde O(τ^{-1}\sqrt{SAK})$, which is minimax-optimal for constant $τ$. This improves on the best available bounds. By discretizing rewards appropriately, our algorithms are computationally efficient.
△ Less
Submitted 24 May, 2023; v1 submitted 6 February, 2023;
originally announced February 2023.
-
Learning Gaussian Mixtures Using the Wasserstein-Fisher-Rao Gradient Flow
Authors:
Yuling Yan,
Kaizheng Wang,
Philippe Rigollet
Abstract:
Gaussian mixture models form a flexible and expressive parametric family of distributions that has found applications in a wide variety of applications. Unfortunately, fitting these models to data is a notoriously hard problem from a computational perspective. Currently, only moment-based methods enjoy theoretical guarantees while likelihood-based methods are dominated by heuristics such as Expect…
▽ More
Gaussian mixture models form a flexible and expressive parametric family of distributions that has found applications in a wide variety of applications. Unfortunately, fitting these models to data is a notoriously hard problem from a computational perspective. Currently, only moment-based methods enjoy theoretical guarantees while likelihood-based methods are dominated by heuristics such as Expectation-Maximization that are known to fail in simple examples. In this work, we propose a new algorithm to compute the nonparametric maximum likelihood estimator (NPMLE) in a Gaussian mixture model. Our method is based on gradient descent over the space of probability measures equipped with the Wasserstein-Fisher-Rao geometry for which we establish convergence guarantees. In practice, it can be approximated using an interacting particle system where the weight and location of particles are updated alternately. We conduct extensive numerical experiments to confirm the effectiveness of the proposed algorithm compared not only to classical benchmarks but also to similar gradient descent algorithms with respect to simpler geometries. In particular, these simulations illustrate the benefit of updating both weight and location of the interacting particles.
△ Less
Submitted 4 January, 2023;
originally announced January 2023.
-
An introduction to optimization under uncertainty -- A short survey
Authors:
Keivan Shariatmadar,
Kaizheng Wang,
Calvin R. Hubbard,
Hans Hallez,
David Moens
Abstract:
Optimization equips engineers and scientists in a variety of fields with the ability to transcribe their problems into a generic formulation and receive optimal solutions with relative ease. Industries ranging from aerospace to robotics continue to benefit from advancements in optimization theory and the associated algorithmic developments. Nowadays, optimization is used in real time on autonomous…
▽ More
Optimization equips engineers and scientists in a variety of fields with the ability to transcribe their problems into a generic formulation and receive optimal solutions with relative ease. Industries ranging from aerospace to robotics continue to benefit from advancements in optimization theory and the associated algorithmic developments. Nowadays, optimization is used in real time on autonomous systems acting in safety critical situations, such as self-driving vehicles. It has become increasingly more important to produce robust solutions by incorporating uncertainty into optimization programs. This paper provides a short survey about the state of the art in optimization under uncertainty. The paper begins with a brief overview of the main classes of optimization without uncertainty. The rest of the paper focuses on the different methods for handling both aleatoric and epistemic uncertainty. Many of the applications discussed in this paper are within the domain of control. The goal of this survey paper is to briefly touch upon the state of the art in a variety of different methods and refer the reader to other literature for more in-depth treatments of the topics discussed here.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
Functional Bayesian Networks for Discovering Causality from Multivariate Functional Data
Authors:
Fangting Zhou,
Kejun He,
Kunbo Wang,
Yanxun Xu,
Yang Ni
Abstract:
Multivariate functional data arise in a wide range of applications. One fundamental task is to understand the causal relationships among these functional objects of interest, which has not yet been fully explored. In this article, we develop a novel Bayesian network model for multivariate functional data where the conditional independence and causal structure are both encoded by a directed acyclic…
▽ More
Multivariate functional data arise in a wide range of applications. One fundamental task is to understand the causal relationships among these functional objects of interest, which has not yet been fully explored. In this article, we develop a novel Bayesian network model for multivariate functional data where the conditional independence and causal structure are both encoded by a directed acyclic graph. Specifically, we allow the functional objects to deviate from Gaussian process, which is adopted by most existing functional data analysis models. The more reasonable non-Gaussian assumption is the key for unique causal structure identification even when the functions are measured with noises. A fully Bayesian framework is designed to infer the functional Bayesian network model with natural uncertainty quantification through posterior summaries. Simulation studies and real data examples are used to demonstrate the practical utility of the proposed model.
△ Less
Submitted 23 October, 2022;
originally announced October 2022.
-
Adaptive Data Fusion for Multi-task Non-smooth Optimization
Authors:
Henry Lam,
Kaizheng Wang,
Yuhang Wu,
Yichen Zhang
Abstract:
We study the problem of multi-task non-smooth optimization that arises ubiquitously in statistical learning, decision-making and risk management. We develop a data fusion approach that adaptively leverages commonalities among a large number of objectives to improve sample efficiency while tackling their unknown heterogeneities. We provide sharp statistical guarantees for our approach. Numerical ex…
▽ More
We study the problem of multi-task non-smooth optimization that arises ubiquitously in statistical learning, decision-making and risk management. We develop a data fusion approach that adaptively leverages commonalities among a large number of objectives to improve sample efficiency while tackling their unknown heterogeneities. We provide sharp statistical guarantees for our approach. Numerical experiments on both synthetic and real data demonstrate significant advantages of our approach over benchmarks.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Bayesian Tensor-on-Tensor Regression with Efficient Computation
Authors:
Kunbo Wang,
Yanxun Xu
Abstract:
We propose a Bayesian tensor-on-tensor regression approach to predict a multidimensional array (tensor) of arbitrary dimensions from another tensor of arbitrary dimensions, building upon the Tucker decomposition of the regression coefficient tensor. Traditional tensor regression methods making use of the Tucker decomposition either assume the dimension of the core tensor to be known or estimate it…
▽ More
We propose a Bayesian tensor-on-tensor regression approach to predict a multidimensional array (tensor) of arbitrary dimensions from another tensor of arbitrary dimensions, building upon the Tucker decomposition of the regression coefficient tensor. Traditional tensor regression methods making use of the Tucker decomposition either assume the dimension of the core tensor to be known or estimate it via cross-validation or some model selection criteria. However, no existing method can simultaneously estimate the model dimension (the dimension of the core tensor) and other model parameters. To fill this gap, we develop an efficient Markov Chain Monte Carlo (MCMC) algorithm to estimate both the model dimension and parameters for posterior inference. Besides the MCMC sampler, we also develop an ultra-fast optimization-based computing algorithm wherein the maximum a posteriori estimators for parameters are computed, and the model dimension is optimized via a simulated annealing algorithm. The proposed Bayesian framework provides a natural way for uncertainty quantification. Through extensive simulation studies, we evaluate the proposed Bayesian tensor-on-tensor regression model and show its superior performance compared to alternative methods. We also demonstrate its practical effectiveness by applying it to two real-world datasets, including facial imaging data and 3D motion data.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL
Authors:
Fengzhuo Zhang,
Boyi Liu,
Kaixin Wang,
Vincent Y. F. Tan,
Zhuoran Yang,
Zhaoran Wang
Abstract:
The cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications. Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works. In this paper, we verify that the transf…
▽ More
The cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications. Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works. In this paper, we verify that the transformer implements complex relational reasoning, and we propose and analyze model-free and model-based offline MARL algorithms with the transformer approximators. We prove that the suboptimality gaps of the model-free and model-based algorithms are independent of and logarithmic in the number of agents respectively, which mitigates the curse of many agents. These results are consequences of a novel generalization error bound of the transformer and a novel analysis of the Maximum Likelihood Estimate (MLE) of the system dynamics with the transformer. Our model-based algorithm is the first provably efficient MARL algorithm that explicitly exploits the permutation invariance of the agents. Our improved generalization bound may be of independent interest and is applicable to other regression problems related to the transformer beyond MARL.
△ Less
Submitted 16 October, 2022; v1 submitted 20 September, 2022;
originally announced September 2022.
-
Learning Bellman Complete Representations for Offline Policy Evaluation
Authors:
Jonathan D. Chang,
Kaiwen Wang,
Nathan Kallus,
Wen Sun
Abstract:
We study representation learning for Offline Reinforcement Learning (RL), focusing on the important task of Offline Policy Evaluation (OPE). Recent work shows that, in contrast to supervised learning, realizability of the Q-function is not enough for learning it. Two sufficient conditions for sample-efficient OPE are Bellman completeness and coverage. Prior work often assumes that representations…
▽ More
We study representation learning for Offline Reinforcement Learning (RL), focusing on the important task of Offline Policy Evaluation (OPE). Recent work shows that, in contrast to supervised learning, realizability of the Q-function is not enough for learning it. Two sufficient conditions for sample-efficient OPE are Bellman completeness and coverage. Prior work often assumes that representations satisfying these conditions are given, with results being mostly theoretical in nature. In this work, we propose BCRL, which directly learns from data an approximately linear Bellman complete representation with good coverage. With this learned representation, we perform OPE using Least Square Policy Evaluation (LSPE) with linear functions in our learned representation. We present an end-to-end theoretical analysis, showing that our two-stage algorithm enjoys polynomial sample complexity provided some representation in the rich class considered is linear Bellman complete. Empirically, we extensively evaluate our algorithm on challenging, image-based continuous control tasks from the Deepmind Control Suite. We show our representation enables better OPE compared to previous representation learning methods developed for off-policy RL (e.g., CURL, SPR). BCRL achieve competitive OPE error with the state-of-the-art method Fitted Q-Evaluation (FQE), and beats FQE when evaluating beyond the initial state distribution. Our ablations show that both linear Bellman complete and coverage components of our method are crucial.
△ Less
Submitted 12 July, 2022;
originally announced July 2022.
-
Generational Differences in Automobility: Comparing America's Millennials and Gen Xers Using Gradient Boosting Decision Trees
Authors:
Kailai Wang,
Xize Wang
Abstract:
Whether the Millennials are less auto-centric than the previous generations has been widely discussed in the literature. Most existing studies use regression models and assume that all factors are linear-additive in contributing to the young adults' driving behaviors. This study relaxes this assumption by applying a non-parametric statistical learning method, namely the gradient boosting decision…
▽ More
Whether the Millennials are less auto-centric than the previous generations has been widely discussed in the literature. Most existing studies use regression models and assume that all factors are linear-additive in contributing to the young adults' driving behaviors. This study relaxes this assumption by applying a non-parametric statistical learning method, namely the gradient boosting decision trees (GBDT). Using U.S. nationwide travel surveys for 2001 and 2017, this study examines the non-linear dose-response effects of lifecycle, socio-demographic and residential factors on daily driving distances of Millennial and Gen-X young adults. Holding all other factors constant, Millennial young adults had shorter predicted daily driving distances than their Gen-X counterparts. Besides, residential and economic factors explain around 50% of young adults' daily driving distances, while the collective contributions for life course events and demographics are about 33%. This study also identifies the density ranges for formulating effective land use policies aiming at reducing automobile travel demand.
△ Less
Submitted 19 June, 2022;
originally announced June 2022.
-
Simulated redistricting plans for the analysis and evaluation of redistricting in the United States
Authors:
Cory McCartan,
Christopher T. Kenny,
Tyler Simko,
George Garcia III,
Kevin Wang,
Melissa Wu,
Shiro Kuriwaki,
Kosuke Imai
Abstract:
This article introduces the 50stateSimulations, a collection of simulated congressional districting plans and underlying code developed by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. The 50stateSimulations allow for the evaluation of enacted and other congressional redistricting plans in the United States. While the use of redistricting simulation algorithms has become standa…
▽ More
This article introduces the 50stateSimulations, a collection of simulated congressional districting plans and underlying code developed by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. The 50stateSimulations allow for the evaluation of enacted and other congressional redistricting plans in the United States. While the use of redistricting simulation algorithms has become standard in academic research and court cases, any simulation analysis requires non-trivial efforts to combine multiple data sets, identify state-specific redistricting criteria, implement complex simulation algorithms, and summarize and visualize simulation outputs. We have developed a complete workflow that facilitates this entire process of simulation-based redistricting analysis for the congressional districts of all 50 states. The resulting 50stateSimulations include ensembles of simulated 2020 congressional redistricting plans and necessary replication data. We also provide the underlying code, which serves as a template for customized analyses. All data and code are free and publicly available. This article details the design, creation, and validation of the data.
△ Less
Submitted 20 October, 2022; v1 submitted 21 June, 2022;
originally announced June 2022.
-
Aligning Logits Generatively for Principled Black-Box Knowledge Distillation
Authors:
**g Ma,
Xiang Xiang,
Ke Wang,
Yuchuan Wu,
Yongbin Li
Abstract:
Black-Box Knowledge Distillation (B2KD) is a formulated problem for cloud-to-edge model compression with invisible data and models hosted on the server. B2KD faces challenges such as limited Internet exchange and edge-cloud disparity of data distributions. In this paper, we formalize a two-step workflow consisting of deprivatization and distillation, and theoretically provide a new optimization di…
▽ More
Black-Box Knowledge Distillation (B2KD) is a formulated problem for cloud-to-edge model compression with invisible data and models hosted on the server. B2KD faces challenges such as limited Internet exchange and edge-cloud disparity of data distributions. In this paper, we formalize a two-step workflow consisting of deprivatization and distillation, and theoretically provide a new optimization direction from logits to cell boundary different from direct logits alignment. With its guidance, we propose a new method Map**-Emulation KD (MEKD) that distills a black-box cumbersome model into a lightweight one. Our method does not differentiate between treating soft or hard responses, and consists of: 1) deprivatization: emulating the inverse map** of the teacher function with a generator, and 2) distillation: aligning low-dimensional logits of the teacher and student models by reducing the distance of high-dimensional image points. For different teacher-student pairs, our method yields inspiring distillation performance on various benchmarks, and outperforms the previous state-of-the-art approaches.
△ Less
Submitted 30 March, 2024; v1 submitted 20 May, 2022;
originally announced May 2022.
-
Sequential Importance Sampling for Hybrid Model Bayesian Inference to Support Bioprocess Mechanism Learning and Robust Control
Authors:
Wei Xie,
Keqi Wang,
Hua Zheng,
Ben Feng
Abstract:
Driven by the critical needs of biomanufacturing 4.0, we introduce a probabilistic knowledge graph hybrid model characterizing the risk- and science-based understanding of bioprocess mechanisms. It can faithfully capture the important properties, including nonlinear reactions, partially observed state, and nonstationary dynamics. Given very limited real process observations, we derive a posterior…
▽ More
Driven by the critical needs of biomanufacturing 4.0, we introduce a probabilistic knowledge graph hybrid model characterizing the risk- and science-based understanding of bioprocess mechanisms. It can faithfully capture the important properties, including nonlinear reactions, partially observed state, and nonstationary dynamics. Given very limited real process observations, we derive a posterior distribution quantifying model estimation uncertainty. To avoid the evaluation of intractable likelihoods, Approximate Bayesian Computation sampling with Sequential Monte Carlo (ABC-SMC) is utilized to approximate the posterior distribution. Under high stochastic and model uncertainties, it is computationally expensive to match output trajectories. Therefore, we create a linear Gaussian dynamic Bayesian network (LG-DBN) auxiliary likelihood-based ABC-SMC approach. Through matching the summary statistics driven through LG-DBN likelihood that can capture critical interactions and variations, the proposed algorithm can accelerate hybrid model inference, support process monitoring, and facilitate mechanism learning and robust control.
△ Less
Submitted 29 September, 2022; v1 submitted 4 May, 2022;
originally announced May 2022.
-
Stochastic Simulation Uncertainty Analysis to Accelerate Flexible Biomanufacturing Process Development
Authors:
Wei Xie,
Russell R. Barton,
Barry L. Nelson,
Keqi Wang
Abstract:
Motivated by critical challenges and needs from biopharmaceuticals manufacturing, we propose a general metamodel-assisted stochastic simulation uncertainty analysis framework to accelerate the development of a simulation model with modular design for flexible production processes. There are often very limited process observations. Thus, there exist both simulation and model uncertainties in the sy…
▽ More
Motivated by critical challenges and needs from biopharmaceuticals manufacturing, we propose a general metamodel-assisted stochastic simulation uncertainty analysis framework to accelerate the development of a simulation model with modular design for flexible production processes. There are often very limited process observations. Thus, there exist both simulation and model uncertainties in the system performance estimates. In biopharmaceutical manufacturing, model uncertainty often dominates. The proposed framework can produce a confidence interval that accounts for simulation and model uncertainties by using a metamodel-assisted bootstrap** approach. Furthermore, a variance decomposition is utilized to estimate the relative contributions from each source of model uncertainty, as well as simulation uncertainty. This information can be used to improve the system mean performance estimation. Asymptotic analysis provides theoretical support for our approach, while the empirical study demonstrates that it has good finite-sample performance.
△ Less
Submitted 3 September, 2022; v1 submitted 16 March, 2022;
originally announced March 2022.
-
Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning
Authors:
Nathan Kallus,
Xiaojie Mao,
Kaiwen Wang,
Zhengyuan Zhou
Abstract:
Off-policy evaluation and learning (OPE/L) use offline observational data to make better decisions, which is crucial in applications where online experimentation is limited. However, depending entirely on logged data, OPE/L is sensitive to environment distribution shifts -- discrepancies between the data-generating environment and that where policies are deployed. \citet{si2020distributional} prop…
▽ More
Off-policy evaluation and learning (OPE/L) use offline observational data to make better decisions, which is crucial in applications where online experimentation is limited. However, depending entirely on logged data, OPE/L is sensitive to environment distribution shifts -- discrepancies between the data-generating environment and that where policies are deployed. \citet{si2020distributional} proposed distributionally robust OPE/L (DROPE/L) to address this, but the proposal relies on inverse-propensity weighting, whose estimation error and regret will deteriorate if propensities are nonparametrically estimated and whose variance is suboptimal even if not. For standard, non-robust, OPE/L, this is solved by doubly robust (DR) methods, but they do not naturally extend to the more complex DROPE/L, which involves a worst-case expectation. In this paper, we propose the first DR algorithms for DROPE/L with KL-divergence uncertainty sets. For evaluation, we propose Localized Doubly Robust DROPE (LDR$^2$OPE) and show that it achieves semiparametric efficiency under weak product rates conditions. Thanks to a localization technique, LDR$^2$OPE only requires fitting a small number of regressions, just like DR methods for standard OPE. For learning, we propose Continuum Doubly Robust DROPL (CDR$^2$OPL) and show that, under a product rate condition involving a continuum of regressions, it enjoys a fast regret rate of $\mathcal{O}\left(N^{-1/2}\right)$ even when unknown propensities are nonparametrically estimated. We empirically validate our algorithms in simulations and further extend our results to general $f$-divergence uncertainty sets.
△ Less
Submitted 18 July, 2022; v1 submitted 19 February, 2022;
originally announced February 2022.
-
Adaptive and Robust Multi-Task Learning
Authors:
Yaqi Duan,
Kaizheng Wang
Abstract:
We study the multi-task learning problem that aims to simultaneously analyze multiple datasets collected from different sources and learn one model for each of them. We propose a family of adaptive methods that automatically utilize possible similarities among those tasks while carefully handling their differences. We derive sharp statistical guarantees for the methods and prove their robustness a…
▽ More
We study the multi-task learning problem that aims to simultaneously analyze multiple datasets collected from different sources and learn one model for each of them. We propose a family of adaptive methods that automatically utilize possible similarities among those tasks while carefully handling their differences. We derive sharp statistical guarantees for the methods and prove their robustness against outlier tasks. Numerical experiments on synthetic and real datasets demonstrate the efficacy of our new methods.
△ Less
Submitted 16 September, 2023; v1 submitted 10 February, 2022;
originally announced February 2022.
-
Control Variate Polynomial Chaos: Optimal Fusion of Sampling and Surrogates for Multifidelity Uncertainty Quantification
Authors:
Hang Yang,
Yuji Fujii,
K. W. Wang,
Alex A. Gorodetsky
Abstract:
We present a hybrid sampling-surrogate approach for reducing the computational expense of uncertainty quantification in nonlinear dynamical systems. Our motivation is to enable rapid uncertainty quantification in complex mechanical systems such as automotive propulsion systems. Our approach is to build upon ideas from multifidelity uncertainty quantification to leverage the benefits of both sampli…
▽ More
We present a hybrid sampling-surrogate approach for reducing the computational expense of uncertainty quantification in nonlinear dynamical systems. Our motivation is to enable rapid uncertainty quantification in complex mechanical systems such as automotive propulsion systems. Our approach is to build upon ideas from multifidelity uncertainty quantification to leverage the benefits of both sampling and surrogate modeling, while mitigating their downsides. In particular, the surrogate model is selected to exploit problem structure, such as smoothness, and offers a highly correlated information source to the original nonlinear dynamical system. We utilize an intrusive generalized Polynomial Chaos surrogate because it avoids any statistical errors in its construction and provides analytic estimates of output statistics. We then leverage a Monte Carlo-based Control Variate technique to correct the bias caused by the surrogate approximation error. The primary theoretical contribution of this work is the analysis and solution of an estimator design strategy that optimally balances the computational effort needed to adapt a surrogate compared with sampling the original expensive nonlinear system. While previous works have similarly combined surrogates and sampling, to our best knowledge this work is the first to provide rigorous analysis of estimator design. We deploy our approach on multiple examples stemming from the simulation of mechanical automotive propulsion system models. We show that the estimator is able to achieve orders of magnitude reduction in mean squared error of statistics estimation in some cases under comparable costs of purely sampling or purely surrogate approaches.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
Disentanglement and Generalization Under Correlation Shifts
Authors:
Christina M. Funke,
Paul Vicol,
Kuan-Chieh Wang,
Matthias Kümmerer,
Richard Zemel,
Matthias Bethge
Abstract:
Correlations between factors of variation are prevalent in real-world data. Exploiting such correlations may increase predictive performance on noisy data; however, often correlations are not robust (e.g., they may change between domains, datasets, or applications) and models that exploit them do not generalize when correlations shift. Disentanglement methods aim to learn representations which cap…
▽ More
Correlations between factors of variation are prevalent in real-world data. Exploiting such correlations may increase predictive performance on noisy data; however, often correlations are not robust (e.g., they may change between domains, datasets, or applications) and models that exploit them do not generalize when correlations shift. Disentanglement methods aim to learn representations which capture different factors of variation in latent subspaces. A common approach involves minimizing the mutual information between latent subspaces, such that each encodes a single underlying attribute. However, this fails when attributes are correlated. We solve this problem by enforcing independence between subspaces conditioned on the available attributes, which allows us to remove only dependencies that are not due to the correlation structure present in the training data. We achieve this via an adversarial approach to minimize the conditional mutual information (CMI) between subspaces with respect to categorical variables. We first show theoretically that CMI minimization is a good objective for robust disentanglement on linear problems. We then apply our method on real-world datasets based on MNIST and CelebA, and show that it yields models that are disentangled and robust under correlation shift, including in weakly supervised settings.
△ Less
Submitted 23 December, 2022; v1 submitted 29 December, 2021;
originally announced December 2021.
-
Is Importance Weighting Incompatible with Interpolating Classifiers?
Authors:
Ke Alexander Wang,
Niladri S. Chatterji,
Saminul Haque,
Tatsunori Hashimoto
Abstract:
Importance weighting is a classic technique to handle distribution shifts. However, prior work has presented strong empirical and theoretical evidence demonstrating that importance weights can have little to no effect on overparameterized neural networks. Is importance weighting truly incompatible with the training of overparameterized neural networks? Our paper answers this in the negative. We sh…
▽ More
Importance weighting is a classic technique to handle distribution shifts. However, prior work has presented strong empirical and theoretical evidence demonstrating that importance weights can have little to no effect on overparameterized neural networks. Is importance weighting truly incompatible with the training of overparameterized neural networks? Our paper answers this in the negative. We show that importance weighting fails not because of the overparameterization, but instead, as a result of using exponentially-tailed losses like the logistic or cross-entropy loss. As a remedy, we show that polynomially-tailed losses restore the effects of importance reweighting in correcting distribution shift in overparameterized models. We characterize the behavior of gradient descent on importance weighted polynomially-tailed losses with overparameterized linear models, and theoretically demonstrate the advantage of using polynomially-tailed losses in a label shift setting. Surprisingly, our theory shows that using weights that are obtained by exponentiating the classical unbiased importance weights can improve performance. Finally, we demonstrate the practical value of our analysis with neural network experiments on a subpopulation shift and a label shift dataset. When reweighted, our loss function can outperform reweighted cross-entropy by as much as 9% in test accuracy. Our loss function also gives test accuracies comparable to, or even exceeding, well-tuned state-of-the-art methods for correcting distribution shifts.
△ Less
Submitted 4 March, 2022; v1 submitted 24 December, 2021;
originally announced December 2021.
-
Differentially Private Ensemble Classifiers for Data Streams
Authors:
Lovedeep Gondara,
Ke Wang,
Ricardo Silva Carvalho
Abstract:
Learning from continuous data streams via classification/regression is prevalent in many domains. Adapting to evolving data characteristics (concept drift) while protecting data owners' private information is an open challenge. We present a differentially private ensemble solution to this problem with two distinguishing features: it allows an \textit{unbounded} number of ensemble updates to deal w…
▽ More
Learning from continuous data streams via classification/regression is prevalent in many domains. Adapting to evolving data characteristics (concept drift) while protecting data owners' private information is an open challenge. We present a differentially private ensemble solution to this problem with two distinguishing features: it allows an \textit{unbounded} number of ensemble updates to deal with the potentially never-ending data streams under a fixed privacy budget, and it is \textit{model agnostic}, in that it treats any pre-trained differentially private classification/regression model as a black-box. Our method outperforms competitors on real-world and simulated datasets for varying settings of privacy, concept drift, and data distribution.
△ Less
Submitted 8 December, 2021;
originally announced December 2021.
-
Interval Estimation of Relative Risks for Combined Unilateral and Bilateral Correlated Data
Authors:
Kejia Wang,
Chang-Xing Ma
Abstract:
Measurements are generally collected as unilateral or bilateral data in clinical trials or observational studies. For example, in ophthalmology studies, the primary outcome is often obtained from one eye or both eyes of an individual. In medical studies, the relative risk is usually the parameter of interest and is commonly used. In this article, we develop three confidence intervals for the relat…
▽ More
Measurements are generally collected as unilateral or bilateral data in clinical trials or observational studies. For example, in ophthalmology studies, the primary outcome is often obtained from one eye or both eyes of an individual. In medical studies, the relative risk is usually the parameter of interest and is commonly used. In this article, we develop three confidence intervals for the relative risk for combined unilateral and bilateral correlated data under the equal dependence assumption. The proposed confidence intervals are based on maximum likelihood estimates of parameters derived using the Fisher scoring method. Simulation studies are conducted to evaluate the performance of proposed confidence intervals with respect to the empirical coverage probability, the mean interval width, and the ratio of mesial non-coverage probability to the distal non-coverage probability. We also compare the proposed methods with the confidence interval based on the method of variance estimates recovery and the confidence interval obtained from the modified Poisson regression model with correlated binary data. We recommend the score confidence interval for general applications because it best controls converge probabilities at the 95% level with reasonable mean interval width. We illustrate the methods with a real-world example.
△ Less
Submitted 28 October, 2021;
originally announced October 2021.
-
Clustering a Mixture of Gaussians with Unknown Covariance
Authors:
Damek Davis,
Mateo Díaz,
Kaizheng Wang
Abstract:
We investigate a clustering problem with data from a mixture of Gaussians that share a common but unknown, and potentially ill-conditioned, covariance matrix. We start by considering Gaussian mixtures with two equally-sized components and derive a Max-Cut integer program based on maximum likelihood estimation. We prove its solutions achieve the optimal misclassification rate when the number of sam…
▽ More
We investigate a clustering problem with data from a mixture of Gaussians that share a common but unknown, and potentially ill-conditioned, covariance matrix. We start by considering Gaussian mixtures with two equally-sized components and derive a Max-Cut integer program based on maximum likelihood estimation. We prove its solutions achieve the optimal misclassification rate when the number of samples grows linearly in the dimension, up to a logarithmic factor. However, solving the Max-cut problem appears to be computationally intractable. To overcome this, we develop an efficient spectral algorithm that attains the optimal rate but requires a quadratic sample size. Although this sample complexity is worse than that of the Max-cut problem, we conjecture that no polynomial-time method can perform better. Furthermore, we gather numerical and theoretical evidence that supports the existence of a statistical-computational gap. Finally, we generalize the Max-Cut program to a $k$-means program that handles multi-component mixtures with possibly unequal weights. It enjoys similar optimality guarantees for mixtures of distributions that satisfy a transportation-cost inequality, encompassing Gaussian and strongly log-concave distributions.
△ Less
Submitted 29 November, 2021; v1 submitted 4 October, 2021;
originally announced October 2021.
-
Effective Streaming Low-tubal-rank Tensor Approximation via Frequent Directions
Authors:
Qianxin Yi,
Chenhao Wang,
Kaidong Wang,
Yao Wang
Abstract:
Low-tubal-rank tensor approximation has been proposed to analyze large-scale and multi-dimensional data. However, finding such an accurate approximation is challenging in the streaming setting, due to the limited computational resources. To alleviate this issue, this paper extends a popular matrix sketching technique, namely Frequent Directions, for constructing an efficient and accurate low-tubal…
▽ More
Low-tubal-rank tensor approximation has been proposed to analyze large-scale and multi-dimensional data. However, finding such an accurate approximation is challenging in the streaming setting, due to the limited computational resources. To alleviate this issue, this paper extends a popular matrix sketching technique, namely Frequent Directions, for constructing an efficient and accurate low-tubal-rank tensor approximation from streaming data based on the tensor Singular Value Decomposition (t-SVD). Specifically, the new algorithm allows the tensor data to be observed slice by slice, but only needs to maintain and incrementally update a much smaller sketch which could capture the principal information of the original tensor. The rigorous theoretical analysis shows that the approximation error of the new algorithm can be arbitrarily small when the sketch size grows linearly. Extensive experimental results on both synthetic and real multi-dimensional data further reveal the superiority of the proposed algorithm compared with other sketching algorithms for getting low-tubal-rank approximation, in terms of both efficiency and accuracy.
△ Less
Submitted 23 August, 2021;
originally announced August 2021.
-
Towards Better Laplacian Representation in Reinforcement Learning with Generalized Graph Drawing
Authors:
Kaixin Wang,
Kuangqi Zhou,
Qixin Zhang,
Jie Shao,
Bryan Hooi,
Jiashi Feng
Abstract:
The Laplacian representation recently gains increasing attention for reinforcement learning as it provides succinct and informative representation for states, by taking the eigenvectors of the Laplacian matrix of the state-transition graph as state embeddings. Such representation captures the geometry of the underlying state space and is beneficial to RL tasks such as option discovery and reward s…
▽ More
The Laplacian representation recently gains increasing attention for reinforcement learning as it provides succinct and informative representation for states, by taking the eigenvectors of the Laplacian matrix of the state-transition graph as state embeddings. Such representation captures the geometry of the underlying state space and is beneficial to RL tasks such as option discovery and reward sha**. To approximate the Laplacian representation in large (or even continuous) state spaces, recent works propose to minimize a spectral graph drawing objective, which however has infinitely many global minimizers other than the eigenvectors. As a result, their learned Laplacian representation may differ from the ground truth. To solve this problem, we reformulate the graph drawing objective into a generalized form and derive a new learning objective, which is proved to have eigenvectors as its unique global minimizer. It enables learning high-quality Laplacian representations that faithfully approximate the ground truth. We validate this via comprehensive experiments on a set of gridworld and continuous control environments. Moreover, we show that our learned Laplacian representations lead to more exploratory options and better reward sha**.
△ Less
Submitted 12 July, 2021;
originally announced July 2021.
-
Harnessing Heterogeneity: Learning from Decomposed Feedback in Bayesian Modeling
Authors:
Kai Wang,
Bryan Wilder,
Sze-chuan Suen,
Bistra Dilkina,
Milind Tambe
Abstract:
There is significant interest in learning and optimizing a complex system composed of multiple sub-components, where these components may be agents or autonomous sensors. Among the rich literature on this topic, agent-based and domain-specific simulations can capture complex dynamics and subgroup interaction, but optimizing over such simulations can be computationally and algorithmically challengi…
▽ More
There is significant interest in learning and optimizing a complex system composed of multiple sub-components, where these components may be agents or autonomous sensors. Among the rich literature on this topic, agent-based and domain-specific simulations can capture complex dynamics and subgroup interaction, but optimizing over such simulations can be computationally and algorithmically challenging. Bayesian approaches, such as Gaussian processes (GPs), can be used to learn a computationally tractable approximation to the underlying dynamics but typically neglect the detailed information about subgroups in the complicated system. We attempt to find the best of both worlds by proposing the idea of decomposed feedback, which captures group-based heterogeneity and dynamics. We introduce a novel decomposed GP regression to incorporate the subgroup decomposed feedback. Our modified regression has provably lower variance -- and thus a more accurate posterior -- compared to previous approaches; it also allows us to introduce a decomposed GP-UCB optimization algorithm that leverages subgroup feedback. The Bayesian nature of our method makes the optimization algorithm trackable with a theoretical guarantee on convergence and no-regret property. To demonstrate the wide applicability of this work, we execute our algorithm on two disparate social problems: infectious disease control in a heterogeneous population and allocation of distributed weather sensors. Experimental results show that our new method provides significant improvement compared to the state-of-the-art.
△ Less
Submitted 6 July, 2021;
originally announced July 2021.
-
Universal Consistency of Deep Convolutional Neural Networks
Authors:
Shao-Bo Lin,
Kaidong Wang,
Yao Wang,
Ding-Xuan Zhou
Abstract:
Compared with avid research activities of deep convolutional neural networks (DCNNs) in practice, the study of theoretical behaviors of DCNNs lags heavily behind. In particular, the universal consistency of DCNNs remains open. In this paper, we prove that implementing empirical risk minimization on DCNNs with expansive convolution (with zero-padding) is strongly universally consistent. Motivated b…
▽ More
Compared with avid research activities of deep convolutional neural networks (DCNNs) in practice, the study of theoretical behaviors of DCNNs lags heavily behind. In particular, the universal consistency of DCNNs remains open. In this paper, we prove that implementing empirical risk minimization on DCNNs with expansive convolution (with zero-padding) is strongly universally consistent. Motivated by the universal consistency, we conduct a series of experiments to show that without any fully connected layers, DCNNs with expansive convolution perform not worse than the widely used deep neural networks with hybrid structure containing contracting (without zero-padding) convolution layers and several fully connected layers.
△ Less
Submitted 23 June, 2021;
originally announced June 2021.