-
Practical identifiability and parameter estimation of compartmental epidemiological models
Authors:
Q. Y. Chen,
Z. Rapti,
Y. Drossinos,
J. Cuevas-Maraver,
G. A. Kevrekidis,
P. G. Kevrekidis
Abstract:
Practical parameter identifiability in ODE-based epidemiological models is a known issue, yet one that merits further study. It is essentially ubiquitous due to noise and errors in real data. In this study, to avoid uncertainty stemming from data of unknown quality, simulated data with added noise are used to investigate practical identifiability in two distinct epidemiological models. Particular…
▽ More
Practical parameter identifiability in ODE-based epidemiological models is a known issue, yet one that merits further study. It is essentially ubiquitous due to noise and errors in real data. In this study, to avoid uncertainty stemming from data of unknown quality, simulated data with added noise are used to investigate practical identifiability in two distinct epidemiological models. Particular emphasis is placed on the role of initial conditions, which are assumed unknown, except those that are directly measured. Instead of just focusing on one method of estimation, we use and compare results from various broadly used methods, including maximum likelihood and Markov Chain Monte Carlo (MCMC) estimation.
Among other findings, our analysis revealed that the MCMC estimator is overall more robust than the point estimators considered. Its estimates and predictions are improved when the initial conditions of certain compartments are fixed so that the model becomes globally identifiable. For the point estimators, whether fixing or fitting the that are not directly measured improves parameter estimates is model-dependent. Specifically, in the standard SEIR model, fixing the initial condition for the susceptible population S(0) improved parameter estimates, while this was not true when fixing the initial condition of the asymptomatic population in a more involved model. Our study corroborates the change in quality of parameter estimates upon usage of pre-peak or post-peak time-series under consideration. Finally, our examples suggest that in the presence of significantly noisy data, the value of structural identifiability is moot.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Anatomy of Elite and Mass Polarization in Social Networks
Authors:
Ali Salloum,
Ted Hsuan Yun Chen,
Mikko Kivelä
Abstract:
Existing methods for quantifying polarization in social networks typically report a single value describing the amount of polarization in a social system. While this approach can be used to confirm the observation that many societies have witnessed an increase in political polarization in recent years, it misses the complexities that could be used to understand the reasons behind this phenomenon.…
▽ More
Existing methods for quantifying polarization in social networks typically report a single value describing the amount of polarization in a social system. While this approach can be used to confirm the observation that many societies have witnessed an increase in political polarization in recent years, it misses the complexities that could be used to understand the reasons behind this phenomenon. Notably, opposing groups can have unequal impact on polarization, and the elites are often understood to be more divided than the masses, making it critical to differentiate their roles in polarized systems. We propose a method to characterize these distinct hierarchies in polarized networks, enabling separate polarization measurements for these groups within a single social system. Applied to polarized topics in the Finnish Twittersphere surrounding the 2019 and 2023 parliamentary elections, our analysis reveals valuable insights: 1) The impact of opposing groups on observed polarization is rarely balanced, and 2) while the elite strongly contributes to structural polarization and consistently display greater alignment across various topics, the masses have also recently experienced a surge in issue alignment, a special form of polarization. Our findings suggest that the masses may not be as immune to an increasingly polarized environment as previously thought.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Learning High-dimensional Latent Variable Models via Doubly Stochastic Optimisation by Unadjusted Langevin
Authors:
Motonori Oka,
Yunxiao Chen,
Irini Moustaki
Abstract:
Latent variable models are widely used in social and behavioural sciences, such as education, psychology, and political science. In recent years, high-dimensional latent variable models have become increasingly common for analysing large and complex data. Estimating high-dimensional latent variable models using marginal maximum likelihood is computationally demanding due to the complexity of integ…
▽ More
Latent variable models are widely used in social and behavioural sciences, such as education, psychology, and political science. In recent years, high-dimensional latent variable models have become increasingly common for analysing large and complex data. Estimating high-dimensional latent variable models using marginal maximum likelihood is computationally demanding due to the complexity of integrals involved. To address this challenge, stochastic optimisation, which combines stochastic approximation and sampling techniques, has been shown to be effective. This method iterates between two steps -- (1) sampling the latent variables from their posterior distribution based on the current parameter estimate, and (2) updating the fixed parameters using an approximate stochastic gradient constructed from the latent variable samples. In this paper, we propose a computationally more efficient stochastic optimisation algorithm. This improvement is achieved through the use of a minibatch of observations when sampling latent variables and constructing stochastic gradients, and an unadjusted Langevin sampler that utilises the gradient of the negative complete-data log-likelihood to sample latent variables. Theoretical results are established for the proposed algorithm, showing that the iterative parameter update converges to the marginal maximum likelihood estimate as the number of iterations goes to infinity. Furthermore, the proposed algorithm is shown to scale well to high-dimensional settings through simulation studies and a personality test application with 30,000 respondents, 300 items, and 30 latent dimensions.
△ Less
Submitted 14 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Learning in Feature Spaces via Coupled Covariances: Asymmetric Kernel SVD and Nyström method
Authors:
Qinghua Tao,
Francesco Tonin,
Alex Lambert,
Yingyi Chen,
Panagiotis Patrinos,
Johan A. K. Suykens
Abstract:
In contrast with Mercer kernel-based approaches as used e.g., in Kernel Principal Component Analysis (KPCA), it was previously shown that Singular Value Decomposition (SVD) inherently relates to asymmetric kernels and Asymmetric Kernel Singular Value Decomposition (KSVD) has been proposed. However, the existing formulation to KSVD cannot work with infinite-dimensional feature map**s, the variati…
▽ More
In contrast with Mercer kernel-based approaches as used e.g., in Kernel Principal Component Analysis (KPCA), it was previously shown that Singular Value Decomposition (SVD) inherently relates to asymmetric kernels and Asymmetric Kernel Singular Value Decomposition (KSVD) has been proposed. However, the existing formulation to KSVD cannot work with infinite-dimensional feature map**s, the variational objective can be unbounded, and needs further numerical evaluation and exploration towards machine learning. In this work, i) we introduce a new asymmetric learning paradigm based on coupled covariance eigenproblem (CCE) through covariance operators, allowing infinite-dimensional feature maps. The solution to CCE is ultimately obtained from the SVD of the induced asymmetric kernel matrix, providing links to KSVD. ii) Starting from the integral equations corresponding to a pair of coupled adjoint eigenfunctions, we formalize the asymmetric Nyström method through a finite sample approximation to speed up training. iii) We provide the first empirical evaluations verifying the practical utility and benefits of KSVD and compare with methods resorting to symmetrization or linear SVD across multiple tasks.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
How Interpretable Are Interpretable Graph Neural Networks?
Authors:
Yongqiang Chen,
Yatao Bian,
Bo Han,
James Cheng
Abstract:
Interpretable graph neural networks (XGNNs ) are widely adopted in various scientific applications involving graph-structured data. Existing XGNNs predominantly adopt the attention-based mechanism to learn edge or node importance for extracting and making predictions with the interpretable subgraph. However, the representational properties and limitations of these methods remain inadequately explo…
▽ More
Interpretable graph neural networks (XGNNs ) are widely adopted in various scientific applications involving graph-structured data. Existing XGNNs predominantly adopt the attention-based mechanism to learn edge or node importance for extracting and making predictions with the interpretable subgraph. However, the representational properties and limitations of these methods remain inadequately explored. In this work, we present a theoretical framework that formulates interpretable subgraph learning with the multilinear extension of the subgraph distribution, coined as subgraph multilinear extension (SubMT). Extracting the desired interpretable subgraph requires an accurate approximation of SubMT, yet we find that the existing XGNNs can have a huge gap in fitting SubMT. Consequently, the SubMT approximation failure will lead to the degenerated interpretability of the extracted subgraphs. To mitigate the issue, we design a new XGNN architecture called Graph Multilinear neT (GMT), which is provably more powerful in approximating SubMT. We empirically validate our theoretical findings on a number of graph classification benchmarks. The results demonstrate that GMT outperforms the state-of-the-art up to 10% in terms of both interpretability and generalizability across 12 regular and geometric graph benchmarks.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
surveygenmod2: A SAS macro for estimating complex survey adjusted generalized linear models and Wald-type tests
Authors:
R. Noah Padgett,
Ying Chen
Abstract:
surveygenmod2 builds on the macro written by da Silva (2017) for generalized linear models under complex survey designs. The updated macro fixed several minor bugs we encountered while updating the macro for use in SAS\textregistered. We added additional features for conducting basic Wald-type tests on groups of parameters based on the estimated regression coefficients and parameter variance-covar…
▽ More
surveygenmod2 builds on the macro written by da Silva (2017) for generalized linear models under complex survey designs. The updated macro fixed several minor bugs we encountered while updating the macro for use in SAS\textregistered. We added additional features for conducting basic Wald-type tests on groups of parameters based on the estimated regression coefficients and parameter variance-covariance matrix.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
When Swarm Learning meets energy series data: A decentralized collaborative learning design based on blockchain
Authors:
Lei Xu,
Yulong Chen,
Yuntian Chen,
Longfeng Nie,
Xuetao Wei,
Liang Xue,
Dongxiao Zhang
Abstract:
Machine learning models offer the capability to forecast future energy production or consumption and infer essential unknown variables from existing data. However, legal and policy constraints within specific energy sectors render the data sensitive, presenting technical hurdles in utilizing data from diverse sources. Therefore, we propose adopting a Swarm Learning (SL) scheme, which replaces the…
▽ More
Machine learning models offer the capability to forecast future energy production or consumption and infer essential unknown variables from existing data. However, legal and policy constraints within specific energy sectors render the data sensitive, presenting technical hurdles in utilizing data from diverse sources. Therefore, we propose adopting a Swarm Learning (SL) scheme, which replaces the centralized server with a blockchain-based distributed network to address the security and privacy issues inherent in Federated Learning (FL)'s centralized architecture. Within this distributed Collaborative Learning framework, each participating organization governs nodes for inter-organizational communication. Devices from various organizations utilize smart contracts for parameter uploading and retrieval. Consensus mechanism ensures distributed consistency throughout the learning process, guarantees the transparent trustworthiness and immutability of parameters on-chain. The efficacy of the proposed framework is substantiated across three real-world energy series modeling scenarios with superior performance compared to Local Learning approaches, simultaneously emphasizing enhanced data security and privacy over Centralized Learning and FL method. Notably, as the number of data volume and the count of local epochs increases within a threshold, there is an improvement in model performance accompanied by a reduction in the variance of performance errors. Consequently, this leads to an increased stability and reliability in the outcomes produced by the model.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Optimization of geological carbon storage operations with multimodal latent dynamic model and deep reinforcement learning
Authors:
Zhongzheng Wang,
Yuntian Chen,
Guodong Chen,
Dongxiao Zhang
Abstract:
Maximizing storage performance in geological carbon storage (GCS) is crucial for commercial deployment, but traditional optimization demands resource-intensive simulations, posing computational challenges. This study introduces the multimodal latent dynamic (MLD) model, a deep learning framework for fast flow prediction and well control optimization in GCS. The MLD model includes a representation…
▽ More
Maximizing storage performance in geological carbon storage (GCS) is crucial for commercial deployment, but traditional optimization demands resource-intensive simulations, posing computational challenges. This study introduces the multimodal latent dynamic (MLD) model, a deep learning framework for fast flow prediction and well control optimization in GCS. The MLD model includes a representation module for compressed latent representations, a transition module for system state evolution, and a prediction module for flow responses. A novel training strategy combining regression loss and joint-embedding consistency loss enhances temporal consistency and multi-step prediction accuracy. Unlike existing models, the MLD supports diverse input modalities, allowing comprehensive data interactions. The MLD model, resembling a Markov decision process (MDP), can train deep reinforcement learning agents, specifically using the soft actor-critic (SAC) algorithm, to maximize net present value (NPV) through continuous interactions. The approach outperforms traditional methods, achieving the highest NPV while reducing computational resources by over 60%. It also demonstrates strong generalization performance, providing improved decisions for new scenarios based on knowledge from previous ones.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
A Noise-robust Multi-head Attention Mechanism for Formation Resistivity Prediction: Frequency Aware LSTM
Authors:
Yongan Zhang,
Junfeng Zhao,
Jian Li,
Xuanran Wang,
Youzhuang Sun,
Yuntian Chen,
Dongxiao Zhang
Abstract:
The prediction of formation resistivity plays a crucial role in the evaluation of oil and gas reservoirs, identification and assessment of geothermal energy resources, groundwater detection and monitoring, and carbon capture and storage. However, traditional well logging techniques fail to measure accurate resistivity in cased boreholes, and the transient electromagnetic method for cased borehole…
▽ More
The prediction of formation resistivity plays a crucial role in the evaluation of oil and gas reservoirs, identification and assessment of geothermal energy resources, groundwater detection and monitoring, and carbon capture and storage. However, traditional well logging techniques fail to measure accurate resistivity in cased boreholes, and the transient electromagnetic method for cased borehole resistivity logging encounters challenges of high-frequency disaster (the problem of inadequate learning by neural networks in high-frequency features) and noise interference, badly affecting accuracy. To address these challenges, frequency-aware framework and temporal anti-noise block are proposed to build frequency aware LSTM (FAL). The frequency-aware framework implements a dual-stream structure through wavelet transformation, allowing the neural network to simultaneously handle high-frequency and low-frequency flows of time-series data, thus avoiding high-frequency disaster. The temporal anti-noise block integrates multiple attention mechanisms and soft-threshold attention mechanisms, enabling the model to better distinguish noise from redundant features. Ablation experiments demonstrate that the frequency-aware framework and temporal anti-noise block contribute significantly to performance improvement. FAL achieves a 24.3% improvement in R2 over LSTM, reaching the highest value of 0.91 among all models. In robustness experiments, the impact of noise on FAL is approximately 1/8 of the baseline, confirming the noise resistance of FAL. The proposed FAL effectively reduces noise interference in predicting formation resistivity from cased transient electromagnetic well logging curves, better learns high-frequency features, and thereby enhances the prediction accuracy and noise resistance of the neural network model.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Cross-variable Linear Integrated ENhanced Transformer for Photovoltaic power forecasting
Authors:
Jiaxin Gao,
Qinglong Cao,
Yuntian Chen,
Dongxiao Zhang
Abstract:
Photovoltaic (PV) power forecasting plays a crucial role in optimizing the operation and planning of PV systems, thereby enabling efficient energy management and grid integration. However, un certainties caused by fluctuating weather conditions and complex interactions between different variables pose significant challenges to accurate PV power forecasting. In this study, we propose PV-Client (Cro…
▽ More
Photovoltaic (PV) power forecasting plays a crucial role in optimizing the operation and planning of PV systems, thereby enabling efficient energy management and grid integration. However, un certainties caused by fluctuating weather conditions and complex interactions between different variables pose significant challenges to accurate PV power forecasting. In this study, we propose PV-Client (Cross-variable Linear Integrated ENhanced Transformer for Photovoltaic power forecasting) to address these challenges and enhance PV power forecasting accuracy. PV-Client employs an ENhanced Transformer module to capture complex interactions of various features in PV systems, and utilizes a linear module to learn trend information in PV power. Diverging from conventional time series-based Transformer models that use cross-time Attention to learn dependencies between different time steps, the Enhanced Transformer module integrates cross-variable Attention to capture dependencies between PV power and weather factors. Furthermore, PV-Client streamlines the embedding and position encoding layers by replacing the Decoder module with a projection layer. Experimental results on three real-world PV power datasets affirm PV-Client's state-of-the-art (SOTA) performance in PV power forecasting. Specifically, PV-Client surpasses the second-best model GRU by 5.3% in MSE metrics and 0.9% in accuracy metrics at the **gang Station. Similarly, PV-Client outperforms the second-best model SVR by 10.1% in MSE metrics and 0.2% in accuracy metrics at the Xinqingnian Station, and PV-Client exhibits superior performance compared to the second-best model SVR with enhancements of 3.4% in MSE metrics and 0.9% in accuracy metrics at the Hongxing Station.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization
Authors:
Yihang Chen,
Fanghui Liu,
Taiji Suzuki,
Volkan Cevher
Abstract:
This paper studies kernel ridge regression in high dimensions under covariate shifts and analyzes the role of importance re-weighting. We first derive the asymptotic expansion of high dimensional kernels under covariate shifts. By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance. For bias, we analyze the regularization of…
▽ More
This paper studies kernel ridge regression in high dimensions under covariate shifts and analyzes the role of importance re-weighting. We first derive the asymptotic expansion of high dimensional kernels under covariate shifts. By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance. For bias, we analyze the regularization of the arbitrary or well-chosen scale, showing that the bias can behave very differently under different regularization scales. In our analysis, the bias and variance can be characterized by the spectral decay of a data-dependent regularized kernel: the original kernel matrix associated with an additional re-weighting matrix, and thus the re-weighting strategy can be regarded as a data-dependent regularization for better understanding. Besides, our analysis provides asymptotic expansion of kernel functions/vectors under covariate shift, which has its own interest.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Discovering an interpretable mathematical expression for a full wind-turbine wake with artificial intelligence enhanced symbolic regression
Authors:
Ding Wang,
Yuntian Chen,
Shiyi Chen
Abstract:
The rapid expansion of wind power worldwide underscores the critical significance of engineering-focused analytical wake models in both the design and operation of wind farms. These theoretically-derived ana lytical wake models have limited predictive capabilities, particularly in the near-wake region close to the turbine rotor, due to assumptions that do not hold. Knowledge discovery methods can…
▽ More
The rapid expansion of wind power worldwide underscores the critical significance of engineering-focused analytical wake models in both the design and operation of wind farms. These theoretically-derived ana lytical wake models have limited predictive capabilities, particularly in the near-wake region close to the turbine rotor, due to assumptions that do not hold. Knowledge discovery methods can bridge these gaps by extracting insights, adjusting for theoretical assumptions, and develo** accurate models for physical processes. In this study, we introduce a genetic symbolic regression (SR) algorithm to discover an interpretable mathematical expression for the mean velocity deficit throughout the wake, a previously unavailable insight. By incorporating a double Gaussian distribution into the SR algorithm as domain knowledge and designing a hierarchical equation structure, the search space is reduced, thus efficiently finding a concise, physically informed, and robust wake model. The proposed mathematical expression (equation) can predict the wake velocity deficit at any location in the full-wake region with high precision and stability. The model's effectiveness and practicality are validated through experimental data and high-fidelity numerical simulations.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Adaptive Penalized Likelihood method for Markov Chains
Authors:
Yining Zhou,
Ming Gao,
Yiting Chen,
** Shi
Abstract:
Maximum Likelihood Estimation (MLE) and Likelihood Ratio Test (LRT) are widely used methods for estimating the transition probability matrix in Markov chains and identifying significant relationships between transitions, such as equality. However, the estimated transition probability matrix derived from MLE lacks accuracy compared to the real one, and LRT is inefficient in high-dimensional Markov…
▽ More
Maximum Likelihood Estimation (MLE) and Likelihood Ratio Test (LRT) are widely used methods for estimating the transition probability matrix in Markov chains and identifying significant relationships between transitions, such as equality. However, the estimated transition probability matrix derived from MLE lacks accuracy compared to the real one, and LRT is inefficient in high-dimensional Markov chains. In this study, we extended the adaptive Lasso technique from linear models to Markov chains and proposed a novel model by applying penalized maximum likelihood estimation to optimize the estimation of the transition probability matrix. Meanwhile, we demonstrated that the new model enjoys oracle properties, which means the estimated transition probability matrix has the same performance as the real one when given. Simulations show that our new method behave very well overall in comparison with various competitors. Real data analysis further convince the value of our proposed method.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Dynamic Factor Analysis of High-dimensional Recurrent Events
Authors:
Fangyi Chen,
Yunxiao Chen,
Zhiliang Ying,
Kangjie Zhou
Abstract:
Recurrent event time data arise in many studies, including biomedicine, public health, marketing, and social media analysis. High-dimensional recurrent event data involving large numbers of event types and observations become prevalent with the advances in information technology. This paper proposes a semiparametric dynamic factor model for the dimension reduction and prediction of high-dimensiona…
▽ More
Recurrent event time data arise in many studies, including biomedicine, public health, marketing, and social media analysis. High-dimensional recurrent event data involving large numbers of event types and observations become prevalent with the advances in information technology. This paper proposes a semiparametric dynamic factor model for the dimension reduction and prediction of high-dimensional recurrent event data. The proposed model imposes a low-dimensional structure on the mean intensity functions of the event types while allowing for dependencies. A nearly rate-optimal smoothing-based estimator is proposed. An information criterion that consistently selects the number of factors is also developed. Simulation studies demonstrate the effectiveness of these inference tools. The proposed method is applied to grocery shop** data, for which an interpretable factor structure is obtained.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Inference in semiparametric formation models for directed networks
Authors:
Lianqiang Qu,
Lu Chen,
Ting Yan,
Yuguo Chen
Abstract:
We propose a semiparametric model for dyadic link formations in directed networks. The model contains a set of degree parameters that measure different effects of popularity or outgoingness across nodes, a regression parameter vector that reflects the homophily effect resulting from the nodal attributes or pairwise covariates associated with edges, and a set of latent random noises with unknown di…
▽ More
We propose a semiparametric model for dyadic link formations in directed networks. The model contains a set of degree parameters that measure different effects of popularity or outgoingness across nodes, a regression parameter vector that reflects the homophily effect resulting from the nodal attributes or pairwise covariates associated with edges, and a set of latent random noises with unknown distributions. Our interest lies in inferring the unknown degree parameters and homophily parameters. The dimension of the degree parameters increases with the number of nodes. Under the high-dimensional regime, we develop a kernel-based least squares approach to estimate the unknown parameters. The major advantage of our estimator is that it does not encounter the incidental parameter problem for the homophily parameters. We prove consistency of all the resulting estimators of the degree parameters and homophily parameters. We establish high-dimensional central limit theorems for the proposed estimators and provide several applications of our general theory, including testing the existence of degree heterogeneity, testing sparse signals and recovering the support. Simulation studies and a real data application are conducted to illustrate the finite sample performance of the proposed methods.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Clustering Mixtures of Discrete Distributions: A Note on Mitra's Algorithm
Authors:
Mohamed Seif,
Yanxi Chen
Abstract:
In this note, we provide a refined analysis of Mitra's algorithm \cite{mitra2008clustering} for classifying general discrete mixture distribution models. Built upon spectral clustering \cite{mcsherry2001spectral}, this algorithm offers compelling conditions for probability distributions. We enhance this analysis by tailoring the model to bipartite stochastic block models, resulting in more refined…
▽ More
In this note, we provide a refined analysis of Mitra's algorithm \cite{mitra2008clustering} for classifying general discrete mixture distribution models. Built upon spectral clustering \cite{mcsherry2001spectral}, this algorithm offers compelling conditions for probability distributions. We enhance this analysis by tailoring the model to bipartite stochastic block models, resulting in more refined conditions. Compared to those derived in \cite{mitra2008clustering}, our improved separation conditions are obtained.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Principled Probabilistic Imaging using Diffusion Models as Plug-and-Play Priors
Authors:
Zihui Wu,
Yu Sun,
Yifan Chen,
Bingliang Zhang,
Yisong Yue,
Katherine L. Bouman
Abstract:
Diffusion models (DMs) have recently shown outstanding capability in modeling complex image distributions, making them expressive image priors for solving Bayesian inverse problems. However, most existing DM-based methods rely on approximations in the generative process to be generic to different inverse problems, leading to inaccurate sample distributions that deviate from the target posterior de…
▽ More
Diffusion models (DMs) have recently shown outstanding capability in modeling complex image distributions, making them expressive image priors for solving Bayesian inverse problems. However, most existing DM-based methods rely on approximations in the generative process to be generic to different inverse problems, leading to inaccurate sample distributions that deviate from the target posterior defined within the Bayesian framework. To harness the generative power of DMs while avoiding such approximations, we propose a Markov chain Monte Carlo algorithm that performs posterior sampling for general inverse problems by reducing it to sampling the posterior of a Gaussian denoising problem. Crucially, we leverage a general DM formulation as a unified interface that allows for rigorously solving the denoising problem with a range of state-of-the-art DMs. We demonstrate the effectiveness of the proposed method on six inverse problems (three linear and three nonlinear), including a real-world black hole imaging problem. Experimental results indicate that our proposed method offers more accurate reconstructions and posterior estimation compared to existing DM-based imaging inverse methods.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Towards robust prediction of material properties for nuclear reactor design under scarce data -- a study in creep rupture property
Authors:
Yu Chen,
Edoardo Patelli,
Zhen Yang,
Adolphus Lye
Abstract:
Advances in Deep Learning bring further investigation into credibility and robustness, especially for safety-critical engineering applications such as the nuclear industry. The key challenges include the availability of data set (often scarce and sparse) and insufficient consideration of the uncertainty in the data, model, and prediction. This paper therefore presents a meta-learning based approac…
▽ More
Advances in Deep Learning bring further investigation into credibility and robustness, especially for safety-critical engineering applications such as the nuclear industry. The key challenges include the availability of data set (often scarce and sparse) and insufficient consideration of the uncertainty in the data, model, and prediction. This paper therefore presents a meta-learning based approach that is both uncertainty- and prior knowledge-informed, aiming at trustful predictions of material properties for the nuclear reactor design. It is suited for robust learning under limited data. Uncertainty has been accounted for where a distribution of predictor functions are produced for extrapolation. Results suggest it achieves superior performance than existing empirical methods in rupture life prediction, a case which is typically under a small data regime. While demonstrated herein with rupture properties, this learning approach is transferable to solve similar problems of data scarcity across the nuclear industry. It is of great importance to boosting the AI analytics in the nuclear industry by proving the applicability and robustness while providing tools that can be trusted.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control
Authors:
Litu Rout,
Yujia Chen,
Nataniel Ruiz,
Abhishek Kumar,
Constantine Caramanis,
Sanjay Shakkottai,
Wen-Sheng Chu
Abstract:
We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. Existing training-free approaches exhibit difficulties in (a) style extraction from reference images in the absence of additional style or content text descriptions, (b) unwanted content leakage from reference style images, and (c) effective composition of styl…
▽ More
We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. Existing training-free approaches exhibit difficulties in (a) style extraction from reference images in the absence of additional style or content text descriptions, (b) unwanted content leakage from reference style images, and (c) effective composition of style and content. RB-Modulation is built on a novel stochastic optimal controller where a style descriptor encodes the desired attributes through a terminal cost. The resulting drift not only overcomes the difficulties above, but also ensures high fidelity to the reference style and adheres to the given text prompt. We also introduce a cross-attention-based feature aggregation scheme that allows RB-Modulation to decouple content and style from the reference image. With theoretical justification and empirical evidence, our framework demonstrates precise extraction and control of content and style in a training-free manner. Further, our method allows a seamless composition of content and style, which marks a departure from the dependency on external adapters or ControlNets.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize
Authors:
Dongyan Huo,
Yixuan Zhang,
Yudong Chen,
Qiaomin Xie
Abstract:
In this work, we investigate stochastic approximation (SA) with Markovian data and nonlinear updates under constant stepsize $α>0$. Existing work has primarily focused on either i.i.d. data or linear update rules. We take a new perspective and carefully examine the simultaneous presence of Markovian dependency of data and nonlinear update rules, delineating how the interplay between these two stru…
▽ More
In this work, we investigate stochastic approximation (SA) with Markovian data and nonlinear updates under constant stepsize $α>0$. Existing work has primarily focused on either i.i.d. data or linear update rules. We take a new perspective and carefully examine the simultaneous presence of Markovian dependency of data and nonlinear update rules, delineating how the interplay between these two structures leads to complications that are not captured by prior techniques. By leveraging the smoothness and recurrence properties of the SA updates, we develop a fine-grained analysis of the correlation between the SA iterates $θ_k$ and Markovian data $x_k$. This enables us to overcome the obstacles in existing analysis and establish for the first time the weak convergence of the joint process $(x_k, θ_k)_{k\geq0}$. Furthermore, we present a precise characterization of the asymptotic bias of the SA iterates, given by $\mathbb{E}[θ_\infty]-θ^\ast=α(b_\text{m}+b_\text{n}+b_\text{c})+O(α^{3/2})$. Here, $b_\text{m}$ is associated with the Markovian noise, $b_\text{n}$ is tied to the nonlinearity, and notably, $b_\text{c}$ represents a multiplicative interaction between the Markovian noise and nonlinearity, which is absent in previous works. As a by-product of our analysis, we derive finite-time bounds on higher moment $\mathbb{E}[\|θ_k-θ^\ast\|^{2p}]$ and present non-asymptotic geometric convergence rates for the iterates, along with a Central Limit Theorem.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
A Latent Variable Approach to Learning High-dimensional Multivariate longitudinal Data
Authors:
Sze Ming Lee,
Yunxiao Chen,
Tony Sit
Abstract:
High-dimensional multivariate longitudinal data, which arise when many outcome variables are measured repeatedly over time, are becoming increasingly common in social, behavioral and health sciences. We propose a latent variable model for drawing statistical inferences on covariate effects and predicting future outcomes based on high-dimensional multivariate longitudinal data. This model introduce…
▽ More
High-dimensional multivariate longitudinal data, which arise when many outcome variables are measured repeatedly over time, are becoming increasingly common in social, behavioral and health sciences. We propose a latent variable model for drawing statistical inferences on covariate effects and predicting future outcomes based on high-dimensional multivariate longitudinal data. This model introduces unobserved factors to account for the between-variable and across-time dependence and assist the prediction. Statistical inference and prediction tools are developed under a general setting that allows outcome variables to be of mixed types and possibly unobserved for certain time points, for example, due to right censoring. A central limit theorem is established for drawing statistical inferences on regression coefficients. Additionally, an information criterion is introduced to choose the number of factors. The proposed model is applied to customer grocery shop** records to predict and understand shop** behavior.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Generalized Laplace Approximation
Authors:
Yinsong Chen,
Samson S. Yu,
Zhong Li,
Chee Peng Lim
Abstract:
In recent years, the inconsistency in Bayesian deep learning has garnered increasing attention. Tempered or generalized posterior distributions often offer a direct and effective solution to this issue. However, understanding the underlying causes and evaluating the effectiveness of generalized posteriors remain active areas of research. In this study, we introduce a unified theoretical framework…
▽ More
In recent years, the inconsistency in Bayesian deep learning has garnered increasing attention. Tempered or generalized posterior distributions often offer a direct and effective solution to this issue. However, understanding the underlying causes and evaluating the effectiveness of generalized posteriors remain active areas of research. In this study, we introduce a unified theoretical framework to attribute Bayesian inconsistency to model misspecification and inadequate priors. We interpret the generalization of the posterior with a temperature factor as a correction for misspecified models through adjustments to the joint probability model, and the recalibration of priors by redistributing probability mass on models within the hypothesis space using data samples. Additionally, we highlight a distinctive feature of Laplace approximation, which ensures that the generalized normalizing constant can be treated as invariant, unlike the typical scenario in general Bayesian learning where this constant varies with model parameters post-generalization. Building on this insight, we propose the generalized Laplace approximation, which involves a simple adjustment to the computation of the Hessian matrix of the regularized loss function. This method offers a flexible and scalable framework for obtaining high-quality posterior distributions. We assess the performance and properties of the generalized Laplace approximation on state-of-the-art neural networks and real-world datasets.
△ Less
Submitted 24 May, 2024; v1 submitted 22 May, 2024;
originally announced May 2024.
-
Gaussian Measures Conditioned on Nonlinear Observations: Consistency, MAP Estimators, and Simulation
Authors:
Yifan Chen,
Bamdad Hosseini,
Houman Owhadi,
Andrew M Stuart
Abstract:
The article presents a systematic study of the problem of conditioning a Gaussian random variable $ξ$ on nonlinear observations of the form $F \circ φ(ξ)$ where $φ: \mathcal{X} \to \mathbb{R}^N$ is a bounded linear operator and $F$ is nonlinear. Such problems arise in the context of Bayesian inference and recent machine learning-inspired PDE solvers. We give a representer theorem for the condition…
▽ More
The article presents a systematic study of the problem of conditioning a Gaussian random variable $ξ$ on nonlinear observations of the form $F \circ φ(ξ)$ where $φ: \mathcal{X} \to \mathbb{R}^N$ is a bounded linear operator and $F$ is nonlinear. Such problems arise in the context of Bayesian inference and recent machine learning-inspired PDE solvers. We give a representer theorem for the conditioned random variable $ξ\mid F\circ φ(ξ)$, stating that it decomposes as the sum of an infinite-dimensional Gaussian (which is identified analytically) as well as a finite-dimensional non-Gaussian measure. We also introduce a novel notion of the mode of a conditional measure by taking the limit of the natural relaxation of the problem, to which we can apply the existing notion of maximum a posteriori estimators of posterior measures. Finally, we introduce a variant of the Laplace approximation for the efficient simulation of the aforementioned conditioned Gaussian random variables towards uncertainty quantification.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Determine the Number of States in Hidden Markov Models via Marginal Likelihood
Authors:
Yang Chen,
Cheng-Der Fuh,
Chu-Lan Michael Kao
Abstract:
Hidden Markov models (HMM) have been widely used by scientists to model stochastic systems: the underlying process is a discrete Markov chain and the observations are noisy realizations of the underlying process. Determining the number of hidden states for an HMM is a model selection problem, which is yet to be satisfactorily solved, especially for the popular Gaussian HMM with heterogeneous covar…
▽ More
Hidden Markov models (HMM) have been widely used by scientists to model stochastic systems: the underlying process is a discrete Markov chain and the observations are noisy realizations of the underlying process. Determining the number of hidden states for an HMM is a model selection problem, which is yet to be satisfactorily solved, especially for the popular Gaussian HMM with heterogeneous covariance. In this paper, we propose a consistent method for determining the number of hidden states of HMM based on the marginal likelihood, which is obtained by integrating out both the parameters and hidden states. Moreover, we show that the model selection problem of HMM includes the order selection problem of finite mixture models as a special case. We give rigorous proof of the consistency of the proposed marginal likelihood method and provide an efficient computation method for practical implementation. We numerically compare the proposed method with the Bayesian information criterion (BIC), demonstrating the effectiveness of the proposed marginal likelihood method.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Solar Imaging Data Analytics: A Selective Overview of Challenges and Opportunities
Authors:
Yang Chen,
Ward Manchester,
Meng **,
Alexei Pevtsov
Abstract:
We give a gentle introduction to solar imaging data, focusing on the challenges and opportunities of data-driven approaches for solar eruptions. The various solar phenomenon prediction problems that might benefit from statistical methods are presented. Available data products and software are described. State-of-the-art solar eruption forecasting models with data-driven approaches are summarized a…
▽ More
We give a gentle introduction to solar imaging data, focusing on the challenges and opportunities of data-driven approaches for solar eruptions. The various solar phenomenon prediction problems that might benefit from statistical methods are presented. Available data products and software are described. State-of-the-art solar eruption forecasting models with data-driven approaches are summarized and discussed. Based on the characteristics of the datasets and state-of-the-art approaches, we point out several promising directions to explore from statistical modeling and computational perspectives.
△ Less
Submitted 2 July, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Nonparametric Inference on Dose-Response Curves Without the Positivity Condition
Authors:
Yikun Zhang,
Yen-Chi Chen,
Alexander Giessing
Abstract:
Existing statistical methods in causal inference often rely on the assumption that every individual has some chance of receiving any treatment level regardless of its associated covariates, which is known as the positivity condition. This assumption could be violated in observational studies with continuous treatments. In this paper, we present a novel integral estimator of the causal effects with…
▽ More
Existing statistical methods in causal inference often rely on the assumption that every individual has some chance of receiving any treatment level regardless of its associated covariates, which is known as the positivity condition. This assumption could be violated in observational studies with continuous treatments. In this paper, we present a novel integral estimator of the causal effects with continuous treatments (i.e., dose-response curves) without requiring the positivity condition. Our approach involves estimating the derivative function of the treatment effect on each observed data sample and integrating it to the treatment level of interest so as to address the bias resulting from the lack of positivity condition. The validity of our approach relies on an alternative weaker assumption that can be satisfied by additive confounding models. We provide a fast and reliable numerical recipe for computing our estimator in practice and derive its related asymptotic theory. To conduct valid inference on the dose-response curve and its derivative, we propose using the nonparametric bootstrap and establish its consistency. The practical performances of our proposed estimators are validated through simulation studies and an analysis of the effect of air pollution exposure (PM$_{2.5}$) on cardiovascular mortality rates.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Promoting AI Equity in Science: Generalized Domain Prompt Learning for Accessible VLM Research
Authors:
Qinglong Cao,
Yuntian Chen,
Lu Lu,
Hao Sun,
Zhenzhong Zeng,
Xiaokang Yang,
Dongxiao Zhang
Abstract:
Large-scale Vision-Language Models (VLMs) have demonstrated exceptional performance in natural vision tasks, motivating researchers across domains to explore domain-specific VLMs. However, the construction of powerful domain-specific VLMs demands vast amounts of annotated data, substantial electrical energy, and computing resources, primarily accessible to industry, yet hindering VLM research in a…
▽ More
Large-scale Vision-Language Models (VLMs) have demonstrated exceptional performance in natural vision tasks, motivating researchers across domains to explore domain-specific VLMs. However, the construction of powerful domain-specific VLMs demands vast amounts of annotated data, substantial electrical energy, and computing resources, primarily accessible to industry, yet hindering VLM research in academia. To address this challenge and foster sustainable and equitable VLM research, we present the Generalized Domain Prompt Learning (GDPL) framework. GDPL facilitates the transfer of VLMs' robust recognition capabilities from natural vision to specialized domains, without the need for extensive data or resources. By leveraging small-scale domain-specific foundation models and minimal prompt samples, GDPL empowers the language branch with domain knowledge through quaternion networks, uncovering cross-modal relationships between domain-specific vision features and natural vision-based contextual embeddings. Simultaneously, GDPL guides the vision branch into specific domains through hierarchical propagation of generated vision prompt features, grounded in well-matched vision-language relations. Furthermore, to fully harness the domain adaptation potential of VLMs, we introduce a novel low-rank adaptation approach. Extensive experiments across diverse domains like remote sensing, medical imaging, geology, Synthetic Aperture Radar, and fluid dynamics, validate the efficacy of GDPL, demonstrating its ability to achieve state-of-the-art domain recognition performance in a prompt learning paradigm. Our framework paves the way for sustainable and inclusive VLM research, transcending the barriers between academia and industry.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
LLM4ED: Large Language Models for Automatic Equation Discovery
Authors:
Mengge Du,
Yuntian Chen,
Zhongzheng Wang,
Longfeng Nie,
Dongxiao Zhang
Abstract:
Equation discovery is aimed at directly extracting physical laws from data and has emerged as a pivotal research domain. Previous methods based on symbolic mathematics have achieved substantial advancements, but often require the design of implementation of complex algorithms. In this paper, we introduce a new framework that utilizes natural language-based prompts to guide large language models (L…
▽ More
Equation discovery is aimed at directly extracting physical laws from data and has emerged as a pivotal research domain. Previous methods based on symbolic mathematics have achieved substantial advancements, but often require the design of implementation of complex algorithms. In this paper, we introduce a new framework that utilizes natural language-based prompts to guide large language models (LLMs) in automatically mining governing equations from data. Specifically, we first utilize the generation capability of LLMs to generate diverse equations in string form, and then evaluate the generated equations based on observations. In the optimization phase, we propose two alternately iterated strategies to optimize generated equations collaboratively. The first strategy is to take LLMs as a black-box optimizer and achieve equation self-improvement based on historical samples and their performance. The second strategy is to instruct LLMs to perform evolutionary operators for global search. Experiments are extensively conducted on both partial differential equations and ordinary differential equations. Results demonstrate that our framework can discover effective equations to reveal the underlying physical laws under various nonlinear dynamic systems. Further comparisons are made with state-of-the-art models, demonstrating good stability and usability. Our framework substantially lowers the barriers to learning and applying equation discovery techniques, demonstrating the application potential of LLMs in the field of knowledge discovery.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
WATCH: A Workflow to Assess Treatment Effect Heterogeneity in Drug Development for Clinical Trial Sponsors
Authors:
Konstantinos Sechidis,
Sophie Sun,
Yao Chen,
Jiarui Lu,
Cong Zang,
Mark Baillie,
David Ohlssen,
Marc Vandemeulebroecke,
Rob Hemmings,
Stephen Ruberg,
Björn Bornkamp
Abstract:
This paper proposes a Workflow for Assessing Treatment effeCt Heterogeneity (WATCH) in clinical drug development targeted at clinical trial sponsors. The workflow is designed to address the challenges of investigating treatment effect heterogeneity (TEH) in randomized clinical trials, where sample size and multiplicity limit the reliability of findings. The proposed workflow includes four steps: A…
▽ More
This paper proposes a Workflow for Assessing Treatment effeCt Heterogeneity (WATCH) in clinical drug development targeted at clinical trial sponsors. The workflow is designed to address the challenges of investigating treatment effect heterogeneity (TEH) in randomized clinical trials, where sample size and multiplicity limit the reliability of findings. The proposed workflow includes four steps: Analysis Planning, Initial Data Analysis and Analysis Dataset Creation, TEH Exploration, and Multidisciplinary Assessment. The workflow aims to provide a systematic approach to explore treatment effect heterogeneity in the exploratory setting, taking into account external evidence and best scientific understanding.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Conformalized Tensor Completion with Riemannian Optimization
Authors:
Hu Sun,
Yang Chen
Abstract:
Tensor data, or multi-dimensional array, is a data format popular in multiple fields such as social network analysis, recommender systems, and brain imaging. It is not uncommon to observe tensor data containing missing values and tensor completion aims at estimating the missing values given the partially observed tensor. Sufficient efforts have been spared on devising scalable tensor completion al…
▽ More
Tensor data, or multi-dimensional array, is a data format popular in multiple fields such as social network analysis, recommender systems, and brain imaging. It is not uncommon to observe tensor data containing missing values and tensor completion aims at estimating the missing values given the partially observed tensor. Sufficient efforts have been spared on devising scalable tensor completion algorithms but few on quantifying the uncertainty of the estimator. In this paper, we nest the uncertainty quantification (UQ) of tensor completion under a split conformal prediction framework and establish the connection of the UQ problem to a problem of estimating the missing propensity of each tensor entry. We model the data missingness of the tensor with a tensor Ising model parameterized by a low-rank tensor parameter. We propose to estimate the tensor parameter by maximum pseudo-likelihood estimation (MPLE) with a Riemannian gradient descent algorithm. Extensive simulation studies have been conducted to justify the validity of the resulting conformal interval. We apply our method to the regional total electron content (TEC) reconstruction problem.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Regression for matrix-valued data via Kronecker products factorization
Authors:
Yin-Jen Chen,
Minh Tang
Abstract:
We study the matrix-variate regression problem $Y_i = \sum_{k} β_{1k} X_i β_{2k}^{\top} + E_i$ for $i=1,2\dots,n$ in the high dimensional regime wherein the response $Y_i$ are matrices whose dimensions $p_{1}\times p_{2}$ outgrow both the sample size $n$ and the dimensions $q_{1}\times q_{2}$ of the predictor variables $X_i$ i.e., $q_{1},q_{2} \ll n \ll p_{1},p_{2}$. We propose an estimation algor…
▽ More
We study the matrix-variate regression problem $Y_i = \sum_{k} β_{1k} X_i β_{2k}^{\top} + E_i$ for $i=1,2\dots,n$ in the high dimensional regime wherein the response $Y_i$ are matrices whose dimensions $p_{1}\times p_{2}$ outgrow both the sample size $n$ and the dimensions $q_{1}\times q_{2}$ of the predictor variables $X_i$ i.e., $q_{1},q_{2} \ll n \ll p_{1},p_{2}$. We propose an estimation algorithm, termed KRO-PRO-FAC, for estimating the parameters $\{β_{1k}\} \subset \Re^{p_1 \times q_1}$ and $\{β_{2k}\} \subset \Re^{p_2 \times q_2}$ that utilizes the Kronecker product factorization and rearrangement operations from Van Loan and Pitsianis (1993). The KRO-PRO-FAC algorithm is computationally efficient as it does not require estimating the covariance between the entries of the $\{Y_i\}$. We establish perturbation bounds between $\hatβ_{1k} -β_{1k}$ and $\hatβ_{2k} - β_{2k}$ in spectral norm for the setting where either the rows of $E_i$ or the columns of $E_i$ are independent sub-Gaussian random vectors. Numerical studies on simulated and real data indicate that our procedure is competitive, in terms of both estimation error and predictive accuracy, compared to other existing methods.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
CVTN: Cross Variable and Temporal Integration for Time Series Forecasting
Authors:
Han Zhou,
Yuntian Chen
Abstract:
In multivariate time series forecasting, the Transformer architecture encounters two significant challenges: effectively mining features from historical sequences and avoiding overfitting during the learning of temporal dependencies. To tackle these challenges, this paper deconstructs time series forecasting into the learning of historical sequences and prediction sequences, introducing the Cross-…
▽ More
In multivariate time series forecasting, the Transformer architecture encounters two significant challenges: effectively mining features from historical sequences and avoiding overfitting during the learning of temporal dependencies. To tackle these challenges, this paper deconstructs time series forecasting into the learning of historical sequences and prediction sequences, introducing the Cross-Variable and Time Network (CVTN). This unique method divides multivariate time series forecasting into two phases: cross-variable learning for effectively mining fea tures from historical sequences, and cross-time learning to capture the temporal dependencies of prediction sequences. Separating these two phases helps avoid the impact of overfitting in cross-time learning on cross-variable learning. Exten sive experiments on various real-world datasets have confirmed its state-of-the-art (SOTA) performance. CVTN emphasizes three key dimensions in time series fore casting: the short-term and long-term nature of time series (locality and longevity), feature mining from both historical and prediction sequences, and the integration of cross-variable and cross-time learning. This approach not only advances the current state of time series forecasting but also provides a more comprehensive framework for future research in this field.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Enhancing Uncertain Demand Prediction in Hospitals Using Simple and Advanced Machine Learning
Authors:
Annie Hu,
Samuel Stockman,
Xun Wu,
Richard Wood,
Bangdong Zhi,
Oliver Y. Chén
Abstract:
Early and timely prediction of patient care demand not only affects effective resource allocation but also influences clinical decision-making as well as patient experience. Accurately predicting patient care demand, however, is a ubiquitous challenge for hospitals across the world due, in part, to the demand's time-varying temporal variability, and, in part, to the difficulty in modelling trends…
▽ More
Early and timely prediction of patient care demand not only affects effective resource allocation but also influences clinical decision-making as well as patient experience. Accurately predicting patient care demand, however, is a ubiquitous challenge for hospitals across the world due, in part, to the demand's time-varying temporal variability, and, in part, to the difficulty in modelling trends in advance. To address this issue, here, we develop two methods, a relatively simple time-vary linear model, and a more advanced neural network model. The former forecasts patient arrivals hourly over a week based on factors such as day of the week and previous 7-day arrival patterns. The latter leverages a long short-term memory (LSTM) model, capturing non-linear relationships between past data and a three-day forecasting window. We evaluate the predictive capabilities of the two proposed approaches compared to two naïve approaches - a reduced-rank vector autoregressive (VAR) model and the TBATS model. Using patient care demand data from Rambam Medical Center in Israel, our results show that both proposed models effectively capture hourly variations of patient demand. Additionally, the linear model is more explainable thanks to its simple architecture, whereas, by accurately modelling weekly seasonal trends, the LSTM model delivers lower prediction errors. Taken together, our explorations suggest the utility of machine learning in predicting time-varying patient care demand; additionally, it is possible to predict patient care demand with good accuracy (around 4 patients) three days or a week in advance using machine learning.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Bridging Data Barriers among Participants: Assessing the Potential of Geoenergy through Federated Learning
Authors:
Weike Peng,
Jiaxin Gao,
Yuntian Chen,
Shengwei Wang
Abstract:
Machine learning algorithms emerge as a promising approach in energy fields, but its practical is hindered by data barriers, stemming from high collection costs and privacy concerns. This study introduces a novel federated learning (FL) framework based on XGBoost models, enabling safe collaborative modeling with accessible yet concealed data from multiple parties. Hyperparameter tuning of the mode…
▽ More
Machine learning algorithms emerge as a promising approach in energy fields, but its practical is hindered by data barriers, stemming from high collection costs and privacy concerns. This study introduces a novel federated learning (FL) framework based on XGBoost models, enabling safe collaborative modeling with accessible yet concealed data from multiple parties. Hyperparameter tuning of the models is achieved through Bayesian Optimization. To ascertain the merits of the proposed FL-XGBoost method, a comparative analysis is conducted between separate and centralized models to address a classical binary classification problem in geoenergy sector. The results reveal that the proposed FL framework strikes an optimal balance between privacy and accuracy. FL models demonstrate superior accuracy and generalization capabilities compared to separate models, particularly for participants with limited data or low correlation features and offers significant privacy benefits compared to centralized model. The aggregated optimization approach within the FL agreement proves effective in tuning hyperparameters. This study opens new avenues for assessing unconventional reservoirs through collaborative and privacy-preserving FL techniques.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Likelihood Based Inference in Fully and Partially Observed Exponential Family Graphical Models with Intractable Normalizing Constants
Authors:
Yujie Chen,
Anindya Bhadra,
Antik Chakraborty
Abstract:
Probabilistic graphical models that encode an underlying Markov random field are fundamental building blocks of generative modeling to learn latent representations in modern multivariate data sets with complex dependency structures. Among these, the exponential family graphical models are especially popular, given their fairly well-understood statistical properties and computational scalability to…
▽ More
Probabilistic graphical models that encode an underlying Markov random field are fundamental building blocks of generative modeling to learn latent representations in modern multivariate data sets with complex dependency structures. Among these, the exponential family graphical models are especially popular, given their fairly well-understood statistical properties and computational scalability to high-dimensional data based on pseudo-likelihood methods. These models have been successfully applied in many fields, such as the Ising model in statistical physics and count graphical models in genomics. Another strand of models allows some nodes to be latent, so as to allow the marginal distribution of the observable nodes to depart from exponential family to capture more complex dependence. These approaches form the basis of generative models in artificial intelligence, such as the Boltzmann machines and their restricted versions. A fundamental barrier to likelihood-based (i.e., both maximum likelihood and fully Bayesian) inference in both fully and partially observed cases is the intractability of the likelihood. The usual workaround is via adopting pseudo-likelihood based approaches, following the pioneering work of Besag (1974). The goal of this paper is to demonstrate that full likelihood based analysis of these models is feasible in a computationally efficient manner. The chief innovation lies in using a technique of Geyer (1991) to estimate the intractable normalizing constant, as well as its gradient, for intractable graphical models. Extensive numerical results, supporting theory and comparisons with pseudo-likelihood based approaches demonstrate the applicability of the proposed method.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Prelimit Coupling and Steady-State Convergence of Constant-stepsize Nonsmooth Contractive SA
Authors:
Yixuan Zhang,
Dongyan Huo,
Yudong Chen,
Qiaomin Xie
Abstract:
Motivated by Q-learning, we study nonsmooth contractive stochastic approximation (SA) with constant stepsize. We focus on two important classes of dynamics: 1) nonsmooth contractive SA with additive noise, and 2) synchronous and asynchronous Q-learning, which features both additive and multiplicative noise. For both dynamics, we establish weak convergence of the iterates to a stationary limit dist…
▽ More
Motivated by Q-learning, we study nonsmooth contractive stochastic approximation (SA) with constant stepsize. We focus on two important classes of dynamics: 1) nonsmooth contractive SA with additive noise, and 2) synchronous and asynchronous Q-learning, which features both additive and multiplicative noise. For both dynamics, we establish weak convergence of the iterates to a stationary limit distribution in Wasserstein distance. Furthermore, we propose a prelimit coupling technique for establishing steady-state convergence and characterize the limit of the stationary distribution as the stepsize goes to zero. Using this result, we derive that the asymptotic bias of nonsmooth SA is proportional to the square root of the stepsize, which stands in sharp contrast to smooth SA. This bias characterization allows for the use of Richardson-Romberg extrapolation for bias reduction in nonsmooth SA.
△ Less
Submitted 24 April, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
Generative Model for Change Point Detection in Dynamic Graphs
Authors:
Yik Lun Kei,
Jialiang Li,
Hangjian Li,
Yanzhen Chen,
Oscar Hernan Madrid Padilla
Abstract:
This paper proposes a generative model to detect change points in time series of graphs. The proposed framework consists of learnable prior distributions for low-dimensional graph representations and of a decoder that can generate graphs from the latent representations. The informative prior distributions in the latent spaces are learned from the observed data as empirical Bayes, and the expressiv…
▽ More
This paper proposes a generative model to detect change points in time series of graphs. The proposed framework consists of learnable prior distributions for low-dimensional graph representations and of a decoder that can generate graphs from the latent representations. The informative prior distributions in the latent spaces are learned from the observed data as empirical Bayes, and the expressive power of generative model is exploited to assist multiple change point detection. Specifically, the model parameters are learned via maximum approximate likelihood, with a Group Fused Lasso regularization on the prior parameters. The optimization problem is then solved via Alternating Direction Method of Multipliers (ADMM), and Langevin Dynamics are recruited for posterior inference. Experiments in both simulated and real data demonstrate the ability of the generative model in supporting change point detection with good performance.
△ Less
Submitted 21 June, 2024; v1 submitted 6 April, 2024;
originally announced April 2024.
-
Bayesian Methods for Modeling Cumulative Exposure to Extensive Environmental Health Hazards
Authors:
Rob Trangucci,
Jesse Contreras,
Jon Zelner,
Joseph N. S. Eisenberg,
Yang Chen
Abstract:
Measuring the impact of an environmental point source exposure on the risk of disease, like cancer or childhood asthma, is well-developed. Modeling how an environmental health hazard that is extensive in space, like a wastewater canal, is not. We propose a novel Bayesian generative semiparametric model for characterizing the cumulative spatial exposure to an environmental health hazard that is not…
▽ More
Measuring the impact of an environmental point source exposure on the risk of disease, like cancer or childhood asthma, is well-developed. Modeling how an environmental health hazard that is extensive in space, like a wastewater canal, is not. We propose a novel Bayesian generative semiparametric model for characterizing the cumulative spatial exposure to an environmental health hazard that is not well-represented by a single point in space. The model couples a dose-response model with a log-Gaussian Cox process integrated against a distance kernel with an unknown length-scale. We show that this model is a well-defined Bayesian inverse model, namely that the posterior exists under a Gaussian process prior for the log-intensity of exposure, and that a simple integral approximation adequately controls the computational error. We quantify the finite-sample properties and the computational tractability of the discretization scheme in a simulation study. Finally, we apply the model to survey data on household risk of childhood diarrheal illness from exposure to a system of wastewater canals in Mezquital Valley, Mexico.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Time-Varying Matrix Factor Models
Authors:
Bin Chen,
Elynn Y. Chen,
Stevenson Bolivar,
Rong Chen
Abstract:
Matrix-variate data of high dimensions are frequently observed in finance and economics, spanning extended time periods, such as the long-term data on international trade flows among numerous countries. To address potential structural shifts and explore the matrix structure's informational context, we propose a time-varying matrix factor model. This model accommodates changing factor loadings over…
▽ More
Matrix-variate data of high dimensions are frequently observed in finance and economics, spanning extended time periods, such as the long-term data on international trade flows among numerous countries. To address potential structural shifts and explore the matrix structure's informational context, we propose a time-varying matrix factor model. This model accommodates changing factor loadings over time, revealing the underlying dynamic structure through nonparametric principal component analysis and facilitating dimension reduction. We establish the consistency and asymptotic normality of our estimators under general conditions that allow for weak correlations across time, rows, or columns of the noise. A novel approach is introduced to overcome rotational ambiguity in the estimators, enhancing the clarity and interpretability of the estimated loading matrices. Our simulation study highlights the merits of the proposed estimators and the effective of the smoothing operation. In an application to international trade flow, we investigate the trading hubs, centrality, patterns, and trends in the trading network.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
"Sound and Fury": Nonlinear Functionals of Volatility Matrix in the Presence of Jump and Noise
Authors:
Richard Y. Chen
Abstract:
This paper resolves a pivotal open problem on nonparametric inference for nonlinear functionals of volatility matrix. Multiple prominent statistical tasks can be formulated as functionals of volatility matrix, yet a unified statistical theory of general nonlinear functionals based on noisy data remains challenging and elusive. Nonetheless, this paper shows it can be achieved by combining the stren…
▽ More
This paper resolves a pivotal open problem on nonparametric inference for nonlinear functionals of volatility matrix. Multiple prominent statistical tasks can be formulated as functionals of volatility matrix, yet a unified statistical theory of general nonlinear functionals based on noisy data remains challenging and elusive. Nonetheless, this paper shows it can be achieved by combining the strengths of pre-averaging, jump truncation and nonlinearity bias correction. In light of general nonlinearity, bias correction beyond linear approximation becomes necessary. Resultant estimators are nonparametric and robust over a wide spectrum of stochastic models. Moreover, the estimators can be rate-optimal and stable central limit theorems are obtained. The proposed framework lends itself conveniently to uncertainty quantification and permits fully feasible inference. With strong theoretical guarantees, this paper provides an inferential foundation for a wealth of statistical methods for noisy high-frequency data, such as realized principal component analysis, continuous-time linear regression, realized Laplace transform, generalized method of integrated moments and specification tests, hence extends current application scopes to noisy data which is more prevalent in practice.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Optimal Contract Design for End-of-Life Care Payments
Authors:
Muyan Jiang,
Ying Chen,
Xin Chen,
Javad Lavaei,
Anil Aswani
Abstract:
A large fraction of total healthcare expenditure occurs due to end-of-life (EOL) care, which means it is important to study the problem of more carefully incentivizing necessary versus unnecessary EOL care because this has the potential to reduce overall healthcare spending. This paper introduces a principal-agent model that integrates a mixed payment system of fee-for-service and pay-for-performa…
▽ More
A large fraction of total healthcare expenditure occurs due to end-of-life (EOL) care, which means it is important to study the problem of more carefully incentivizing necessary versus unnecessary EOL care because this has the potential to reduce overall healthcare spending. This paper introduces a principal-agent model that integrates a mixed payment system of fee-for-service and pay-for-performance in order to analyze whether it is possible to better align healthcare provider incentives with patient outcomes and cost-efficiency in EOL care. The primary contributions are to derive optimal contracts for EOL care payments using a principal-agent framework under three separate models for the healthcare provider, where each model considers a different level of risk tolerance for the provider. We derive these optimal contracts by converting the underlying principal-agent models from a bilevel optimization problem into a single-level optimization problem that can be analytically solved. Our results are demonstrated using a simulation where an optimal contract is used to price intracranial pressure monitoring for traumatic brain injuries.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Probabilistic Forecasting with Stochastic Interpolants and Föllmer Processes
Authors:
Yifan Chen,
Mark Goldstein,
Mengjian Hua,
Michael S. Albergo,
Nicholas M. Boffi,
Eric Vanden-Eijnden
Abstract:
We propose a framework for probabilistic forecasting of dynamical systems based on generative modeling. Given observations of the system state over time, we formulate the forecasting problem as sampling from the conditional distribution of the future system state given its current state. To this end, we leverage the framework of stochastic interpolants, which facilitates the construction of a gene…
▽ More
We propose a framework for probabilistic forecasting of dynamical systems based on generative modeling. Given observations of the system state over time, we formulate the forecasting problem as sampling from the conditional distribution of the future system state given its current state. To this end, we leverage the framework of stochastic interpolants, which facilitates the construction of a generative model between an arbitrary base distribution and the target. We design a fictitious, non-physical stochastic dynamics that takes as initial condition the current system state and produces as output a sample from the target conditional distribution in finite time and without bias. This process therefore maps a point mass centered at the current state onto a probabilistic ensemble of forecasts. We prove that the drift coefficient entering the stochastic differential equation (SDE) achieving this task is non-singular, and that it can be learned efficiently by square loss regression over the time-series data. We show that the drift and the diffusion coefficients of this SDE can be adjusted after training, and that a specific choice that minimizes the impact of the estimation error gives a Föllmer process. We highlight the utility of our approach on several complex, high-dimensional forecasting problems, including stochastically forced Navier-Stokes and video prediction on the KTH and CLEVRER datasets.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Graph Neural Networks for Learning Equivariant Representations of Neural Networks
Authors:
Miltiadis Kofinas,
Boris Knyazev,
Yan Zhang,
Yunlu Chen,
Gertjan J. Burghouts,
Efstratios Gavves,
Cees G. M. Snoek,
David W. Zhang
Abstract:
Neural networks that process the parameters of other neural networks find applications in domains as diverse as classifying implicit neural representations, generating neural network weights, and predicting generalization errors. However, existing approaches either overlook the inherent permutation symmetry in the neural network or rely on intricate weight-sharing patterns to achieve equivariance,…
▽ More
Neural networks that process the parameters of other neural networks find applications in domains as diverse as classifying implicit neural representations, generating neural network weights, and predicting generalization errors. However, existing approaches either overlook the inherent permutation symmetry in the neural network or rely on intricate weight-sharing patterns to achieve equivariance, while ignoring the impact of the network architecture itself. In this work, we propose to represent neural networks as computational graphs of parameters, which allows us to harness powerful graph neural networks and transformers that preserve permutation symmetry. Consequently, our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations, predicting generalization performance, and learning to optimize, while consistently outperforming state-of-the-art methods. The source code is open-sourced at https://github.com/mkofinas/neural-graphs.
△ Less
Submitted 20 March, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
Do CLIPs Always Generalize Better than ImageNet Models?
Authors:
Qizhou Wang,
Yong Lin,
Yongqiang Chen,
Ludwig Schmidt,
Bo Han,
Tong Zhang
Abstract:
Large vision language models, such as CLIPs, have revolutionized modern machine learning. CLIPs have demonstrated great generalizability under distribution shifts, supported by an increasing body of literature. However, the evaluation datasets for CLIPs are variations primarily designed for ImageNet benchmarks, which may not fully reflect the extent to which CLIPs, e.g., pre-trained on LAION, robu…
▽ More
Large vision language models, such as CLIPs, have revolutionized modern machine learning. CLIPs have demonstrated great generalizability under distribution shifts, supported by an increasing body of literature. However, the evaluation datasets for CLIPs are variations primarily designed for ImageNet benchmarks, which may not fully reflect the extent to which CLIPs, e.g., pre-trained on LAION, robust to spurious correlations. To bridge the gap, we collect a real-world dataset called CounterAnimal that contains realistic spurious features found in animal photos. CounterAnimal consists of a) the common group: comprising animals on common backgrounds, and b) the counter group: including animals on unusual backgrounds. The performance drops from the common to counter groups quantify the reliance of models on spurious features (i.e., backgrounds) to predict the animals. We find that CLIPs trained on either LAION or the OpenAI data exhibit notable performance drops on the counter group. Surprisingly, we observe that single-modal models trained on ImageNet are more robust than CLIPs. We provide both theoretical and empirical explanations for why CLIPs still learn spurious features. Our findings suggest that distribution shifts remain an open problem for CLIPs, and one needs to be cautious about test setups when evaluating foundation models pre-trained on a significantly different scale and distribution.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Span-Based Optimal Sample Complexity for Weakly Communicating and General Average Reward MDPs
Authors:
Matthew Zurek,
Yudong Chen
Abstract:
We study the sample complexity of learning an $\varepsilon$-optimal policy in an average-reward Markov decision process (MDP) under a generative model. For weakly communicating MDPs, we establish the complexity bound $\widetilde{O}(SA\frac{H}{\varepsilon^2} )$, where $H$ is the span of the bias function of the optimal policy and $SA$ is the cardinality of the state-action space. Our result is the…
▽ More
We study the sample complexity of learning an $\varepsilon$-optimal policy in an average-reward Markov decision process (MDP) under a generative model. For weakly communicating MDPs, we establish the complexity bound $\widetilde{O}(SA\frac{H}{\varepsilon^2} )$, where $H$ is the span of the bias function of the optimal policy and $SA$ is the cardinality of the state-action space. Our result is the first that is minimax optimal (up to log factors) in all parameters $S,A,H$, and $\varepsilon$, improving on existing work that either assumes uniformly bounded mixing times for all policies or has suboptimal dependence on the parameters. We also initiate the study of sample complexity in general (multichain) average-reward MDPs. We argue a new transient time parameter $B$ is necessary, establish an $\widetilde{O}(SA\frac{B + H}{\varepsilon^2})$ complexity bound, and prove a matching (up to log factors) minimax lower bound. Both results are based on reducing the average-reward MDP to a discounted MDP, which requires new ideas in the general setting. To optimally analyze this reduction, we develop improved bounds for $γ$-discounted MDPs, showing that $\widetilde{O}(SA\frac{H}{(1-γ)^2\varepsilon^2} )$ and $\widetilde{O}(SA\frac{B + H}{(1-γ)^2\varepsilon^2} )$ samples suffice to learn $\varepsilon$-optimal policies in weakly communicating and in general MDPs, respectively. Both these results circumvent the well-known minimax lower bound of $\widetildeΩ(SA\frac{1}{(1-γ)^3\varepsilon^2} )$ for $γ$-discounted MDPs, and establish a quadratic rather than cubic horizon dependence for a fixed MDP instance.
△ Less
Submitted 4 June, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
Effects of model misspecification on small area estimators
Authors:
Yuting Chen,
Partha Lahiri,
Nicola Salvati
Abstract:
Nested error regression models are commonly used to incorporate observational unit specific auxiliary variables to improve small area estimates. When the mean structure of this model is misspecified, there is generally an increase in the mean square prediction error (MSPE) of Empirical Best Linear Unbiased Predictors (EBLUP). Observed Best Prediction (OBP) method has been proposed with the intent…
▽ More
Nested error regression models are commonly used to incorporate observational unit specific auxiliary variables to improve small area estimates. When the mean structure of this model is misspecified, there is generally an increase in the mean square prediction error (MSPE) of Empirical Best Linear Unbiased Predictors (EBLUP). Observed Best Prediction (OBP) method has been proposed with the intent to improve on the MSPE over EBLUP. We conduct a Monte Carlo simulation experiment to understand the effect of mispsecification of mean structures on different small area estimators. Our simulation results lead to an unexpected result that OBP may perform very poorly when observational unit level auxiliary variables are used and that OBP can be improved significantly when population means of those auxiliary variables (area level auxiliary variables) are used in the nested error regression model or when a corresponding area level model is used. Our simulation also indicates that the MSPE of OBP in an increasing function of the difference between the sample and population means of the auxiliary variables.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Identifying Causal Effects Under Functional Dependencies
Authors:
Yizuo Chen,
Adnan Darwiche
Abstract:
We study the identification of causal effects, motivated by two improvements to identifiability which can be attained if one knows that some variables in a causal graph are functionally determined by their parents (without needing to know the specific functions). First, an unidentifiable causal effect may become identifiable when certain variables are functional. Second, certain functional variabl…
▽ More
We study the identification of causal effects, motivated by two improvements to identifiability which can be attained if one knows that some variables in a causal graph are functionally determined by their parents (without needing to know the specific functions). First, an unidentifiable causal effect may become identifiable when certain variables are functional. Second, certain functional variables can be excluded from being observed without affecting the identifiability of a causal effect, which may significantly reduce the number of needed variables in observational data. Our results are largely based on an elimination procedure which removes functional variables from a causal graph while preserving key properties in the resulting causal graph, including the identifiability of causal effects.
△ Less
Submitted 22 May, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Accelerating Convergence of Score-Based Diffusion Models, Provably
Authors:
Gen Li,
Yu Huang,
Timofey Efimov,
Yuting Wei,
Yuejie Chi,
Yuxin Chen
Abstract:
Score-based diffusion models, while achieving remarkable empirical performance, often suffer from low sampling speed, due to extensive function evaluations needed during the sampling phase. Despite a flurry of recent activities towards speeding up diffusion generative modeling in practice, theoretical underpinnings for acceleration techniques remain severely limited. In this paper, we design novel…
▽ More
Score-based diffusion models, while achieving remarkable empirical performance, often suffer from low sampling speed, due to extensive function evaluations needed during the sampling phase. Despite a flurry of recent activities towards speeding up diffusion generative modeling in practice, theoretical underpinnings for acceleration techniques remain severely limited. In this paper, we design novel training-free algorithms to accelerate popular deterministic (i.e., DDIM) and stochastic (i.e., DDPM) samplers. Our accelerated deterministic sampler converges at a rate $O(1/{T}^2)$ with $T$ the number of steps, improving upon the $O(1/T)$ rate for the DDIM sampler; and our accelerated stochastic sampler converges at a rate $O(1/T)$, outperforming the rate $O(1/\sqrt{T})$ for the DDPM sampler. The design of our algorithms leverages insights from higher-order approximation, and shares similar intuitions as popular high-order ODE solvers like the DPM-Solver-2. Our theory accommodates $\ell_2$-accurate score estimates, and does not require log-concavity or smoothness on the target distribution.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Entry-Specific Bounds for Low-Rank Matrix Completion under Highly Non-Uniform Sampling
Authors:
Xumei Xi,
Christina Lee Yu,
Yudong Chen
Abstract:
Low-rank matrix completion concerns the problem of estimating unobserved entries in a matrix using a sparse set of observed entries. We consider the non-uniform setting where the observed entries are sampled with highly varying probabilities, potentially with different asymptotic scalings. We show that under structured sampling probabilities, it is often better and sometimes optimal to run estimat…
▽ More
Low-rank matrix completion concerns the problem of estimating unobserved entries in a matrix using a sparse set of observed entries. We consider the non-uniform setting where the observed entries are sampled with highly varying probabilities, potentially with different asymptotic scalings. We show that under structured sampling probabilities, it is often better and sometimes optimal to run estimation algorithms on a smaller submatrix rather than the entire matrix. In particular, we prove error upper bounds customized to each entry, which match the minimax lower bounds under certain conditions. Our bounds characterize the hardness of estimating each entry as a function of the localized sampling probabilities. We provide numerical experiments that confirm our theoretical findings.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
DIGIC: Domain Generalizable Imitation Learning by Causal Discovery
Authors:
Yang Chen,
Yitao Liang,
Zhouchen Lin
Abstract:
Causality has been combined with machine learning to produce robust representations for domain generalization. Most existing methods of this type require massive data from multiple domains to identify causal features by cross-domain variations, which can be expensive or even infeasible and may lead to misidentification in some cases. In this work, we make a different attempt by leveraging the demo…
▽ More
Causality has been combined with machine learning to produce robust representations for domain generalization. Most existing methods of this type require massive data from multiple domains to identify causal features by cross-domain variations, which can be expensive or even infeasible and may lead to misidentification in some cases. In this work, we make a different attempt by leveraging the demonstration data distribution to discover the causal features for a domain generalizable policy. We design a novel framework, called DIGIC, to identify the causal features by finding the direct cause of the expert action from the demonstration data distribution via causal discovery. Our framework can achieve domain generalizable imitation learning with only single-domain data and serve as a complement for cross-domain variation-based methods under non-structural assumptions on the underlying causal models. Our empirical study in various control tasks shows that the proposed framework evidently improves the domain generalization performance and has comparable performance to the expert in the original domain simultaneously.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.