Search | arXiv e-print repository

Evaluating Model Performance Under Worst-case Subpopulations

Authors: Mike Li, Hongseok Namkoong, Shangzhou Xia

Abstract: The performance of ML models degrades when the training population is different from that seen under operation. Towards assessing distributional robustness, we study the worst-case performance of a model over all subpopulations of a given size, defined with respect to core attributes Z. This notion of robustness can consider arbitrary (continuous) attributes Z, and automatically accounts for compl… ▽ More The performance of ML models degrades when the training population is different from that seen under operation. Towards assessing distributional robustness, we study the worst-case performance of a model over all subpopulations of a given size, defined with respect to core attributes Z. This notion of robustness can consider arbitrary (continuous) attributes Z, and automatically accounts for complex intersectionality in disadvantaged groups. We develop a scalable yet principled two-stage estimation procedure that can evaluate the robustness of state-of-the-art models. We prove that our procedure enjoys several finite-sample convergence guarantees, including dimension-free convergence. Instead of overly conservative notions based on Rademacher complexities, our evaluation error depends on the dimension of Z only through the out-of-sample error in estimating the performance conditional on Z. On real datasets, we demonstrate that our method certifies the robustness of a model and prevents deployment of unreliable models. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Earlier version appeared in the proceedings of Advances in Neural Information Processing Systems 34 (NeurIPS 2021): https://proceedings.neurips.cc/paper_files/paper/2021/file/908075ea2c025c335f4865f7db427062-Paper.pdf

arXiv:2406.13036 [pdf, other]

Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalities

Authors: Matthew T. C. Li, Tiangang Cui, Fengyi Li, Youssef Marzouk, Olivier Zahm

Abstract: Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approximating a target measure $π$ as a perturbation of a given reference measure $μ$ along a few significant directions of $\mathbb{R}^{d}$. The reference measure can be a Gaussian or a nonlinear transformation of a Ga… ▽ More Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approximating a target measure $π$ as a perturbation of a given reference measure $μ$ along a few significant directions of $\mathbb{R}^{d}$. The reference measure can be a Gaussian or a nonlinear transformation of a Gaussian, as commonly arising in generative modeling. Our method extends prior work on minimizing majorizations of the Kullback--Leibler divergence to identify optimal approximations within this class of measures. Our main contribution unveils a connection between the \emph{dimensional} logarithmic Sobolev inequality (LSI) and approximations with this ansatz. Specifically, when the target and reference are both Gaussian, we show that minimizing the dimensional LSI is equivalent to minimizing the KL divergence restricted to this ansatz. For general non-Gaussian measures, the dimensional LSI produces majorants that uniformly improve on previous majorants for gradient-based dimension reduction. We further demonstrate the applicability of this analysis to the squared Hellinger distance, where analogous reasoning shows that the dimensional Poincaré inequality offers improved bounds. △ Less

Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.03707 [pdf, other]

What Should Embeddings Embed? Autoregressive Models Represent Latent Generating Distributions

Authors: Liyi Zhang, Michael Y. Li, Thomas L. Griffiths

Abstract: Autoregressive language models have demonstrated a remarkable ability to extract latent structure from text. The embeddings from large language models have been shown to capture aspects of the syntax and semantics of language. But what {\em should} embeddings represent? We connect the autoregressive prediction objective to the idea of constructing predictive sufficient statistics to summarize the… ▽ More Autoregressive language models have demonstrated a remarkable ability to extract latent structure from text. The embeddings from large language models have been shown to capture aspects of the syntax and semantics of language. But what {\em should} embeddings represent? We connect the autoregressive prediction objective to the idea of constructing predictive sufficient statistics to summarize the information contained in a sequence of observations, and use this connection to identify three settings where the optimal content of embeddings can be identified: independent identically distributed data, where the embedding should capture the sufficient statistics of the data; latent state models, where the embedding should encode the posterior distribution over states given the data; and discrete hypothesis spaces, where the embedding should reflect the posterior distribution over hypotheses given the data. We then conduct empirical probing studies to show that transformers encode these three kinds of latent generating distributions, and that they perform well in out-of-distribution cases and without token memorization in these settings. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 15 pages, 8 figures

ACM Class: I.2; I.5

arXiv:2404.19495 [pdf]

Percentage Coefficient (bp) -- Effect Size Analysis (Theory Paper 1)

Authors: Xinshu Zhao, Dianshi Moses Li, Ze Zack Lai, Piper Li** Liu, Song Harris Ao, Fei You

Abstract: Percentage coefficient (bp) has emerged in recent publications as an additional and alternative estimator of effect size for regression analysis. This paper retraces the theory behind the estimator. It's posited that an estimator must first serve the fundamental function of enabling researchers and readers to comprehend an estimand, the target of estimation. It may then serve the instrumental func… ▽ More Percentage coefficient (bp) has emerged in recent publications as an additional and alternative estimator of effect size for regression analysis. This paper retraces the theory behind the estimator. It's posited that an estimator must first serve the fundamental function of enabling researchers and readers to comprehend an estimand, the target of estimation. It may then serve the instrumental function of enabling researchers and readers to compare two or more estimands. Defined as the regression coefficient when dependent variable (DV) and independent variable (IV) are both on conceptual 0-1 percentage scales, percentage coefficients (bp) feature 1) clearly comprehendible interpretation and 2) equitable scales for comparison. The coefficient (bp) serves the two functions effectively and efficiently. It thus serves needs unserved by other indicators, such as raw coefficient (bw) and standardized beta. Another premise of the functionalist theory is that "effect" is not a monolithic concept. Rather, it is a collection of concepts, each of which measures a component of the conglomerate called "effect", thereby serving a subfunction. Regression coefficient (b), for example, indicates the unit change in DV associated with a one-unit increase in IV, thereby measuring one aspect called unit effect, aka efficiency. Percentage coefficient (bp) indicates the percentage change in DV associated with a whole scale increase in IV. It is not meant to be an all-encompassing indicator of an all-encompassing concept, but rather a comprehendible and comparable indicator of efficiency, a key aspect of effect. △ Less

Submitted 6 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.17019 [pdf, other]

Neyman Meets Causal Machine Learning: Experimental Evaluation of Individualized Treatment Rules

Authors: Michael Lingzhi Li, Kosuke Imai

Abstract: A century ago, Neyman showed how to evaluate the efficacy of treatment using a randomized experiment under a minimal set of assumptions. This classical repeated sampling framework serves as a basis of routine experimental analyses conducted by today's scientists across disciplines. In this paper, we demonstrate that Neyman's methodology can also be used to experimentally evaluate the efficacy of i… ▽ More A century ago, Neyman showed how to evaluate the efficacy of treatment using a randomized experiment under a minimal set of assumptions. This classical repeated sampling framework serves as a basis of routine experimental analyses conducted by today's scientists across disciplines. In this paper, we demonstrate that Neyman's methodology can also be used to experimentally evaluate the efficacy of individualized treatment rules (ITRs), which are derived by modern causal machine learning algorithms. In particular, we show how to account for additional uncertainty resulting from a training process based on cross-fitting. The primary advantage of Neyman's approach is that it can be applied to any ITR regardless of the properties of machine learning algorithms that are used to derive the ITR. We also show, somewhat surprisingly, that for certain metrics, it is more efficient to conduct this ex-post experimental evaluation of an ITR than to conduct an ex-ante experimental evaluation that randomly assigns some units to the ITR. Our analysis demonstrates that Neyman's repeated sampling framework is as relevant for causal inference today as it has been since its inception. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.13836 [pdf, other]

MultiFun-DAG: Multivariate Functional Directed Acyclic Graph

Authors: Tian Lan, Ziyue Li, Junpeng Lin, Zhishuai Li, Lei Bai, Man Li, Fugee Tsung, Rui Zhao, Chen Zhang

Abstract: Directed Acyclic Graphical (DAG) models efficiently formulate causal relationships in complex systems. Traditional DAGs assume nodes to be scalar variables, characterizing complex systems under a facile and oversimplified form. This paper considers that nodes can be multivariate functional data and thus proposes a multivariate functional DAG (MultiFun-DAG). It constructs a hidden bilinear multivar… ▽ More Directed Acyclic Graphical (DAG) models efficiently formulate causal relationships in complex systems. Traditional DAGs assume nodes to be scalar variables, characterizing complex systems under a facile and oversimplified form. This paper considers that nodes can be multivariate functional data and thus proposes a multivariate functional DAG (MultiFun-DAG). It constructs a hidden bilinear multivariate function-to-function regression to describe the causal relationships between different nodes. Then an Expectation-Maximum algorithm is used to learn the graph structure as a score-based algorithm with acyclic constraints. Theoretical properties are diligently derived. Prudent numerical studies and a case study from urban traffic congestion analysis are conducted to show MultiFun-DAG's effectiveness. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2403.11343 [pdf, other]

Federated Transfer Learning with Differential Privacy

Authors: Mengchu Li, Ye Tian, Yang Feng, Yi Yu

Abstract: Federated learning is gaining increasing popularity, with data heterogeneity and privacy being two prominent challenges. In this paper, we address both issues within a federated transfer learning framework, aiming to enhance learning on a target data set by leveraging information from multiple heterogeneous source data sets while adhering to privacy constraints. We rigorously formulate the notion… ▽ More Federated learning is gaining increasing popularity, with data heterogeneity and privacy being two prominent challenges. In this paper, we address both issues within a federated transfer learning framework, aiming to enhance learning on a target data set by leveraging information from multiple heterogeneous source data sets while adhering to privacy constraints. We rigorously formulate the notion of \textit{federated differential privacy}, which offers privacy guarantees for each data set without assuming a trusted central server. Under this privacy constraint, we study three classical statistical problems, namely univariate mean estimation, low-dimensional linear regression, and high-dimensional linear regression. By investigating the minimax rates and identifying the costs of privacy for these problems, we show that federated differential privacy is an intermediate privacy model between the well-established local and central models of differential privacy. Our analyses incorporate data heterogeneity and privacy, highlighting the fundamental costs of both in federated learning and underscoring the benefit of knowledge transfer across data sets. △ Less

Submitted 9 April, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

Comments: 78 pages, 3 figures

arXiv:2403.07031 [pdf, other]

The Cram Method for Efficient Simultaneous Learning and Evaluation

Authors: Zeyang Jia, Kosuke Imai, Michael Lingzhi Li

Abstract: We introduce the "cram" method, a general and efficient approach to simultaneous learning and evaluation using a generic machine learning (ML) algorithm. In a single pass of batched data, the proposed method repeatedly trains an ML algorithm and tests its empirical performance. Because it utilizes the entire sample for both learning and evaluation, cramming is significantly more data-efficient tha… ▽ More We introduce the "cram" method, a general and efficient approach to simultaneous learning and evaluation using a generic machine learning (ML) algorithm. In a single pass of batched data, the proposed method repeatedly trains an ML algorithm and tests its empirical performance. Because it utilizes the entire sample for both learning and evaluation, cramming is significantly more data-efficient than sample-splitting. The cram method also naturally accommodates online learning algorithms, making its implementation computationally efficient. To demonstrate the power of the cram method, we consider the standard policy learning setting where cramming is applied to the same data to both develop an individualized treatment rule (ITR) and estimate the average outcome that would result if the learned ITR were to be deployed. We show that under a minimal set of assumptions, the resulting crammed evaluation estimator is consistent and asymptotically normal. While our asymptotic results require a relatively weak stabilization condition of ML algorithm, we develop a simple, generic method that can be used with any policy learning algorithm to satisfy this condition. Our extensive simulation studies show that, when compared to sample-splitting, cramming reduces the evaluation standard error by more than 40% while improving the performance of learned policy. We also apply the cram method to a randomized clinical trial to demonstrate its applicability to real-world problems. Finally, we briefly discuss future extensions of the cram method to other learning and evaluation settings. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.01633 [pdf, other]

Critical windows: non-asymptotic theory for feature emergence in diffusion models

Authors: Marvin Li, Sitan Chen

Abstract: We develop theory to understand an intriguing property of diffusion models for image generation that we term critical windows. Empirically, it has been observed that there are narrow time intervals in sampling during which particular features of the final image emerge, e.g. the image class or background color (Ho et al., 2020b; Meng et al., 2022; Choi et al., 2022; Raya & Ambrogioni, 2023; Georgie… ▽ More We develop theory to understand an intriguing property of diffusion models for image generation that we term critical windows. Empirically, it has been observed that there are narrow time intervals in sampling during which particular features of the final image emerge, e.g. the image class or background color (Ho et al., 2020b; Meng et al., 2022; Choi et al., 2022; Raya & Ambrogioni, 2023; Georgiev et al., 2023; Sclocchi et al., 2024; Biroli et al., 2024). While this is advantageous for interpretability as it implies one can localize properties of the generation to a small segment of the trajectory, it seems at odds with the continuous nature of the diffusion. We propose a formal framework for studying these windows and show that for data coming from a mixture of strongly log-concave densities, these windows can be provably bounded in terms of certain measures of inter- and intra-group separation. We also instantiate these bounds for concrete examples like well-conditioned Gaussian mixtures. Finally, we use our bounds to give a rigorous interpretation of diffusion models as hierarchical samplers that progressively "decide" output features over a discrete sequence of times. We validate our bounds with synthetic experiments. Additionally, preliminary experiments on Stable Diffusion suggest critical windows may serve as a useful tool for diagnosing fairness and privacy violations in real-world diffusion models. △ Less

Submitted 24 May, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

arXiv:2402.18800 [pdf, other]

BlockEcho: Retaining Long-Range Dependencies for Imputing Block-Wise Missing Data

Authors: Qiao Han, Mingqian Li, Yao Yang, Yiteng Zhai

Abstract: Block-wise missing data poses significant challenges in real-world data imputation tasks. Compared to scattered missing data, block-wise gaps exacerbate adverse effects on subsequent analytic and machine learning tasks, as the lack of local neighboring elements significantly reduces the interpolation capability and predictive power. However, this issue has not received adequate attention. Most SOT… ▽ More Block-wise missing data poses significant challenges in real-world data imputation tasks. Compared to scattered missing data, block-wise gaps exacerbate adverse effects on subsequent analytic and machine learning tasks, as the lack of local neighboring elements significantly reduces the interpolation capability and predictive power. However, this issue has not received adequate attention. Most SOTA matrix completion methods appeared less effective, primarily due to overreliance on neighboring elements for predictions. We systematically analyze the issue and propose a novel matrix completion method ``BlockEcho" for a more comprehensive solution. This method creatively integrates Matrix Factorization (MF) within Generative Adversarial Networks (GAN) to explicitly retain long-distance inter-element relationships in the original matrix. Besides, we incorporate an additional discriminator for GAN, comparing the generator's intermediate progress with pre-trained MF results to constrain high-order feature distributions. Subsequently, we evaluate BlockEcho on public datasets across three domains. Results demonstrate superior performance over both traditional and SOTA methods when imputing block-wise missing data, especially at higher missing rates. The advantage also holds for scattered missing data at high missing rates. We also contribute on the analyses in providing theoretical justification on the optimality and convergence of fusing MF and GAN for missing block data. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.08539 [pdf]

Intelligent Diagnosis of Alzheimer's Disease Based on Machine Learning

Authors: Mingyang Li, Hongyu Liu, Yixuan Li, Zejun Wang, Yuan Yuan, Honglin Dai

Abstract: This study is based on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and aims to explore early detection and disease progression in Alzheimer's disease (AD). We employ innovative data preprocessing strategies, including the use of the random forest algorithm to fill missing data and the handling of outliers and invalid data, thereby fully mining and utilizing these limited data re… ▽ More This study is based on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and aims to explore early detection and disease progression in Alzheimer's disease (AD). We employ innovative data preprocessing strategies, including the use of the random forest algorithm to fill missing data and the handling of outliers and invalid data, thereby fully mining and utilizing these limited data resources. Through Spearman correlation coefficient analysis, we identify some features strongly correlated with AD diagnosis. We build and test three machine learning models using these features: random forest, XGBoost, and support vector machine (SVM). Among them, the XGBoost model performs the best in terms of diagnostic performance, achieving an accuracy of 91%. Overall, this study successfully overcomes the challenge of missing data and provides valuable insights into early detection of Alzheimer's disease, demonstrating its unique research value and practical significance. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.07355 [pdf, ps, other]

Sampling from the Mean-Field Stationary Distribution

Authors: Yunbum Kook, Matthew S. Zhang, Sinho Chewi, Murat A. Erdogdu, Mufan Bill Li

Abstract: We study the complexity of sampling from the stationary distribution of a mean-field SDE, or equivalently, the complexity of minimizing a functional over the space of probability measures which includes an interaction term. Our main insight is to decouple the two key aspects of this problem: (1) approximation of the mean-field SDE via a finite-particle system, via uniform-in-time propagation of ch… ▽ More We study the complexity of sampling from the stationary distribution of a mean-field SDE, or equivalently, the complexity of minimizing a functional over the space of probability measures which includes an interaction term. Our main insight is to decouple the two key aspects of this problem: (1) approximation of the mean-field SDE via a finite-particle system, via uniform-in-time propagation of chaos, and (2) sampling from the finite-particle stationary distribution, via standard log-concave samplers. Our approach is conceptually simpler and its flexibility allows for incorporating the state-of-the-art for both algorithms and theory. This leads to improved guarantees in numerous settings, including better guarantees for optimizing certain two-layer neural networks in the mean-field regime. △ Less

Submitted 18 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

arXiv:2402.07227 [pdf, other]

Time-Delayed Game Strategy Analysis Among Japan, Other Nations, and the International Atomic Energy Agency in the Context of Fukushima Nuclear Wastewater Discharge Decision

Authors: Mingyang Li, Han Pengsihua, Fujiao Meng, Zejun Wang, Weian Liu

Abstract: This academic paper examines the strategic interactions between Japan, other nations, and the International Atomic Energy Agency (IAEA) regarding Japan's decision to release treated nuclear wastewater from the Fukushima Daiichi Nuclear Power Plant into the sea. It introduces a payoff matrix and time-delay elements in replicator dynamic equations to mirror real-world decision-making delays. The pap… ▽ More This academic paper examines the strategic interactions between Japan, other nations, and the International Atomic Energy Agency (IAEA) regarding Japan's decision to release treated nuclear wastewater from the Fukushima Daiichi Nuclear Power Plant into the sea. It introduces a payoff matrix and time-delay elements in replicator dynamic equations to mirror real-world decision-making delays. The paper analyzes the stability of strategies and conditions for different stable states using characteristic roots of a linearized system and numerical simulations. It concludes that time delays significantly affect decision-making stability and evolution trajectories in nuclear wastewater disposal strategies. The study highlights the importance of efficient wastewater treatment technology, the impact of export tax revenue losses on Japan's strategies, and the role of international cooperation. The novelty of the research lies in integrating time-delay elements from ocean dynamics and governmental decision-making into the game-theoretical model. △ Less

Submitted 11 February, 2024; originally announced February 2024.

arXiv:2402.07210 [pdf, other]

Fukushima Nuclear Wastewater Discharge: An Evolutionary Game Theory Approach to International and Domestic Interaction and Strategic Decision-Making

Authors: Mingyang Li, Han Pengsihua, Songqing Zhao, Zejun Wang, Limin Yang, Weian Liu

Abstract: On August 24, 2023, Japan controversially decided to discharge nuclear wastewater from the Fukushima Daiichi Nuclear Power Plant into the ocean, sparking intense domestic and global debates. This study uses evolutionary game theory to analyze the strategic dynamics between Japan, other countries, and the Japan Fisheries Association. By incorporating economic, legal, international aid, and environm… ▽ More On August 24, 2023, Japan controversially decided to discharge nuclear wastewater from the Fukushima Daiichi Nuclear Power Plant into the ocean, sparking intense domestic and global debates. This study uses evolutionary game theory to analyze the strategic dynamics between Japan, other countries, and the Japan Fisheries Association. By incorporating economic, legal, international aid, and environmental factors, the research identifies three evolutionarily stable strategies, analyzing them via numerical simulations. The focus is on Japan's shift from wastewater release to its cessation, exploring the myriad factors influencing this transition and their effects on stakeholders' decisions. Key insights highlight the need for international cooperation, rigorous scientific research, public education, and effective wastewater treatment methods. Offering both a fresh theoretical perspective and practical guidance, this study aims to foster global consensus on nuclear wastewater management, crucial for marine conservation and sustainable development. △ Less

Submitted 11 February, 2024; originally announced February 2024.

arXiv:2401.16320 [pdf, ps, other]

A Strategy for Preparing Quantum Squeezed States Using Reinforcement Learning

Authors: X. L. Zhao, Y. M. Zhao, M. Li, T. T. Li, Q. Liu, S. Guo, X. X. Yi

Abstract: We propose a scheme leveraging reinforcement learning to engineer control fields for generating non-classical states. It is exemplified by the application to prepare spin-squeezed states for an open collective spin model where a linear control field is designed to govern the dynamics. The reinforcement learning agent determines the temporal sequence of control pulses, commencing from a coherent sp… ▽ More We propose a scheme leveraging reinforcement learning to engineer control fields for generating non-classical states. It is exemplified by the application to prepare spin-squeezed states for an open collective spin model where a linear control field is designed to govern the dynamics. The reinforcement learning agent determines the temporal sequence of control pulses, commencing from a coherent spin state in an environment characterized by dissipation and dephasing. Compared to the constant control scenario, this approach provides various control sequences maintaining collective spin squeezing and entanglement. It is observed that denser application of the control pulses enhances the performance of the outcomes. However, there is a minor enhancement in the performance by adding control actions. The proposed strategy demonstrates increased effectiveness for larger systems. Thermal excitations of the reservoir are detrimental to the control outcomes. Feasible experiments are suggested to implement this control proposal based on the comparison with the others. The extensions to continuous control problems and another quantum system are discussed. The replaceability of the reinforcement learning module is also emphasized. This research paves the way for its application in manipulating other quantum systems. △ Less

Submitted 14 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.14343 [pdf, other]

Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective

Authors: Xuechen Zhang, Mingchen Li, Jiasi Chen, Christos Thrampoulidis, Samet Oymak

Abstract: Modern classification problems exhibit heterogeneities across individual classes: Each class may have unique attributes, such as sample size, label quality, or predictability (easy vs difficult), and variable importance at test-time. Without care, these heterogeneities impede the learning process, most notably, when optimizing fairness objectives. Confirming this, under a gaussian mixture setting,… ▽ More Modern classification problems exhibit heterogeneities across individual classes: Each class may have unique attributes, such as sample size, label quality, or predictability (easy vs difficult), and variable importance at test-time. Without care, these heterogeneities impede the learning process, most notably, when optimizing fairness objectives. Confirming this, under a gaussian mixture setting, we show that the optimal SVM classifier for balanced accuracy needs to be adaptive to the class attributes. This motivates us to propose CAP: An effective and general method that generates a class-specific learning strategy (e.g. hyperparameter) based on the attributes of that class. This way, optimization process better adapts to heterogeneities. CAP leads to substantial improvements over the naive approach of assigning separate hyperparameters to each class. We instantiate CAP for loss function design and post-hoc logit adjustment, with emphasis on label-imbalanced problems. We show that CAP is competitive with prior art and its flexibility unlocks clear benefits for fairness objectives beyond balanced accuracy. Finally, we evaluate CAP on problems with label noise as well as weighted test objectives to showcase how CAP can jointly adapt to different heterogeneities. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 15 pages, 8 figures

arXiv:2401.00104 [pdf, other]

Causal State Distillation for Explainable Reinforcement Learning

Authors: Wenhao Lu, Xufeng Zhao, Thilo Fryen, Jae Hee Lee, Mengdi Li, Sven Magg, Stefan Wermter

Abstract: Reinforcement learning (RL) is a powerful technique for training intelligent agents, but understanding why these agents make specific decisions can be quite challenging. This lack of transparency in RL models has been a long-standing problem, making it difficult for users to grasp the reasons behind an agent's behaviour. Various approaches have been explored to address this problem, with one promi… ▽ More Reinforcement learning (RL) is a powerful technique for training intelligent agents, but understanding why these agents make specific decisions can be quite challenging. This lack of transparency in RL models has been a long-standing problem, making it difficult for users to grasp the reasons behind an agent's behaviour. Various approaches have been explored to address this problem, with one promising avenue being reward decomposition (RD). RD is appealing as it sidesteps some of the concerns associated with other methods that attempt to rationalize an agent's behaviour in a post-hoc manner. RD works by exposing various facets of the rewards that contribute to the agent's objectives during training. However, RD alone has limitations as it primarily offers insights based on sub-rewards and does not delve into the intricate cause-and-effect relationships that occur within an RL agent's neural model. In this paper, we present an extension of RD that goes beyond sub-rewards to provide more informative explanations. Our approach is centred on a causal learning framework that leverages information-theoretic measures for explanation objectives that encourage three crucial properties of causal factors: causal sufficiency, sparseness, and orthogonality. These properties help us distill the cause-and-effect relationships between the agent's states and actions or rewards, allowing for a deeper understanding of its decision-making processes. Our framework is designed to generate local explanations and can be applied to a wide range of RL tasks with multiple reward channels. Through a series of experiments, we demonstrate that our approach offers more meaningful and insightful explanations for the agent's action selections. △ Less

Submitted 1 April, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

Comments: https://lukaswill.github.io/; Accepted as oral by CLeaR 2024

arXiv:2311.13768 [pdf, other]

Valid confidence intervals for regression with best subset selection

Authors: Huiming Lin, Meng Li

Abstract: Classical confidence intervals after best subset selection are widely implemented in statistical software and are routinely used to guide practitioners in scientific fields to conclude significance. However, there are increasing concerns in the recent literature about the validity of these confidence intervals in that the intended frequentist coverage is not attained. In the context of the Akaike… ▽ More Classical confidence intervals after best subset selection are widely implemented in statistical software and are routinely used to guide practitioners in scientific fields to conclude significance. However, there are increasing concerns in the recent literature about the validity of these confidence intervals in that the intended frequentist coverage is not attained. In the context of the Akaike information criterion (AIC), recent studies observe an under-coverage phenomenon in terms of overfitting, where the estimate of error variance under the selected submodel is smaller than that for the true model. Under-coverage is particularly troubling in selective inference as it points to inflated Type I errors that would invalidate significant findings. In this article, we delineate a complementary, yet provably more deciding factor behind the incorrect coverage of classical confidence intervals under AIC, in terms of altered conditional sampling distributions of pivotal quantities. Resting on selective techniques developed in other settings, our finite-sample characterization of the selection event under AIC uncovers its geometry as a union of finitely many intervals on the real line, based on which we derive new confidence intervals with guaranteed coverage for any sample size. This geometry derived for AIC selection enables exact (and typically less than exact) conditioning, circumventing the need for the excessive conditioning common in other post-selection methods. The proposed methods are easy to implement and can be broadly applied to other commonly used best subset selection criteria. In an application to a classical US consumption dataset, the proposed confidence intervals arrive at different conclusions compared to the conventional ones, even when the selected model is the full model, leading to interpretable findings that better align with empirical observations. △ Less

Submitted 22 November, 2023; originally announced November 2023.

arXiv:2311.07411 [pdf, ps, other]

A Large Deviations Perspective on Policy Gradient Algorithms

Authors: Wouter Jongeneel, Daniel Kuhn, Mengmeng Li

Abstract: Motivated by policy gradient methods in the context of reinforcement learning, we identify a large deviation rate function for the iterates generated by stochastic gradient descent for possibly non-convex objectives satisfying a Polyak-Łojasiewicz condition. Leveraging the contraction principle from large deviations theory, we illustrate the potential of this result by showing how convergence prop… ▽ More Motivated by policy gradient methods in the context of reinforcement learning, we identify a large deviation rate function for the iterates generated by stochastic gradient descent for possibly non-convex objectives satisfying a Polyak-Łojasiewicz condition. Leveraging the contraction principle from large deviations theory, we illustrate the potential of this result by showing how convergence properties of policy gradient with a softmax parametrization and an entropy regularized objective can be naturally extended to a wide spectrum of other policy parametrizations. △ Less

Submitted 3 June, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: v3; comments are welcome

MSC Class: 60F10; 90C26

arXiv:2311.03967 [pdf, other]

CeCNN: Copula-enhanced convolutional neural networks in joint prediction of refraction error and axial length based on ultra-widefield fundus images

Authors: Chong Zhong, Yang Li, Danjuan Yang, Meiyan Li, Xingyao Zhou, Bo Fu, Catherine C. Liu, A. H. Welsh

Abstract: Ultra-widefield (UWF) fundus images are replacing traditional fundus images in screening, detection, prediction, and treatment of complications related to myopia because their much broader visual range is advantageous for highly myopic eyes. Spherical equivalent (SE) is extensively used as the main myopia outcome measure, and axial length (AL) has drawn increasing interest as an important ocular c… ▽ More Ultra-widefield (UWF) fundus images are replacing traditional fundus images in screening, detection, prediction, and treatment of complications related to myopia because their much broader visual range is advantageous for highly myopic eyes. Spherical equivalent (SE) is extensively used as the main myopia outcome measure, and axial length (AL) has drawn increasing interest as an important ocular component for assessing myopia. Cutting-edge studies show that SE and AL are strongly correlated. Using the joint information from SE and AL is potentially better than using either separately. In the deep learning community, though there is research on multiple-response tasks with a 3D image biomarker, dependence among responses is only sporadically taken into consideration. Inspired by the spirit that information extracted from the data by statistical methods can improve the prediction accuracy of deep learning models, we formulate a class of multivariate response regression models with a higher-order tensor biomarker, for the bivariate tasks of regression-classification and regression-regression. Specifically, we propose a copula-enhanced convolutional neural network (CeCNN) framework that incorporates the dependence between responses through a Gaussian copula (with parameters estimated from a warm-up CNN) and uses the induced copula-likelihood loss with the backbone CNNs. We establish the statistical framework and algorithms for the aforementioned two bivariate tasks. We show that the CeCNN has better prediction accuracy after adding the dependency information to the backbone models. The modeling and the proposed CeCNN algorithm are applicable beyond the UWF scenario and can be effective with other backbones beyond ResNet and LeNet. △ Less

Submitted 1 June, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

arXiv:2311.01287 [pdf, other]

Semiparametric Latent ANOVA Model for Event-Related Potentials

Authors: Cheng-Han Yu, Meng Li, Marina Vannucci

Abstract: Event-related potentials (ERPs) extracted from electroencephalography (EEG) data in response to stimuli are widely used in psychological and neuroscience experiments. A major goal is to link ERP characteristic components to subject-level covariates. Existing methods typically follow two-step approaches, first identifying ERP components using peak detection methods and then relating them to the cov… ▽ More Event-related potentials (ERPs) extracted from electroencephalography (EEG) data in response to stimuli are widely used in psychological and neuroscience experiments. A major goal is to link ERP characteristic components to subject-level covariates. Existing methods typically follow two-step approaches, first identifying ERP components using peak detection methods and then relating them to the covariates. This approach, however, can lead to loss of efficiency due to inaccurate estimates in the initial step, especially considering the low signal-to-noise ratio of EEG data. To address this challenge, we propose a semiparametric latent ANOVA model (SLAM) that unifies inference on ERP components and their association to covariates. SLAM models ERP waveforms via a structured Gaussian process prior that encodes ERP latency in its derivative and links the subject-level latencies to covariates using a latent ANOVA. This unified Bayesian framework provides estimation at both population- and subject- levels, improving the efficiency of the inference by leveraging information across subjects. We automate posterior inference and hyperparameter tuning using a Monte Carlo expectation-maximization algorithm. We demonstrate the advantages of SLAM over competing methods via simulations. Our method allows us to examine how factors or covariates affect the magnitude and/or latency of ERP components, which in turn reflect cognitive, psychological or neural processes. We exemplify this via an application to data from an ERP experiment on speech recognition, where we assess the effect of age on two components of interest. Our results verify the scientific findings that older people take a longer reaction time to respond to external stimuli because of the delay in perception and brain processes. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Journal ref: Data Science in Science, 2024, 3(1), article 2294204

arXiv:2310.19053 [pdf, other]

Datasets and Benchmarks for Nanophotonic Structure and Parametric Design Simulations

Authors: Jungtaek Kim, Mingxuan Li, Oliver Hinder, Paul W. Leu

Abstract: Nanophotonic structures have versatile applications including solar cells, anti-reflective coatings, electromagnetic interference shielding, optical filters, and light emitting diodes. To design and understand these nanophotonic structures, electrodynamic simulations are essential. These simulations enable us to model electromagnetic fields over time and calculate optical properties. In this work,… ▽ More Nanophotonic structures have versatile applications including solar cells, anti-reflective coatings, electromagnetic interference shielding, optical filters, and light emitting diodes. To design and understand these nanophotonic structures, electrodynamic simulations are essential. These simulations enable us to model electromagnetic fields over time and calculate optical properties. In this work, we introduce frameworks and benchmarks to evaluate nanophotonic structures in the context of parametric structure design problems. The benchmarks are instrumental in assessing the performance of optimization algorithms and identifying an optimal structure based on target optical properties. Moreover, we explore the impact of varying grid sizes in electrodynamic simulations, shedding light on how evaluation fidelity can be strategically leveraged in enhancing structure designs. △ Less

Submitted 29 October, 2023; originally announced October 2023.

Comments: 31 pages, 31 figures, 4 tables. Accepted at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023), Datasets and Benchmarks Track

arXiv:2310.18910 [pdf, other]

InstanT: Semi-supervised Learning with Instance-dependent Thresholds

Authors: Muyang Li, Runze Wu, Haoyu Liu, Jun Yu, Xun Yang, Bo Han, Tongliang Liu

Abstract: Semi-supervised learning (SSL) has been a fundamental challenge in machine learning for decades. The primary family of SSL algorithms, known as pseudo-labeling, involves assigning pseudo-labels to confident unlabeled instances and incorporating them into the training set. Therefore, the selection criteria of confident instances are crucial to the success of SSL. Recently, there has been growing in… ▽ More Semi-supervised learning (SSL) has been a fundamental challenge in machine learning for decades. The primary family of SSL algorithms, known as pseudo-labeling, involves assigning pseudo-labels to confident unlabeled instances and incorporating them into the training set. Therefore, the selection criteria of confident instances are crucial to the success of SSL. Recently, there has been growing interest in the development of SSL methods that use dynamic or adaptive thresholds. Yet, these methods typically apply the same threshold to all samples, or use class-dependent thresholds for instances belonging to a certain class, while neglecting instance-level information. In this paper, we propose the study of instance-dependent thresholds, which has the highest degree of freedom compared with existing methods. Specifically, we devise a novel instance-dependent threshold function for all unlabeled instances by utilizing their instance-level ambiguity and the instance-dependent error rates of pseudo-labels, so instances that are more likely to have incorrect pseudo-labels will have higher thresholds. Furthermore, we demonstrate that our instance-dependent threshold function provides a bounded probabilistic guarantee for the correctness of the pseudo-labels it assigns. △ Less

Submitted 29 October, 2023; originally announced October 2023.

Comments: Accepted as poster for NeurIPS 2023

arXiv:2310.12079 [pdf, other]

Differential Equation Scaling Limits of Shaped and Unshaped Neural Networks

Authors: Mufan Bill Li, Mihai Nica

Abstract: Recent analyses of neural networks with shaped activations (i.e. the activation function is scaled as the network size grows) have led to scaling limits described by differential equations. However, these results do not a priori tell us anything about "ordinary" unshaped networks, where the activation is unchanged as the network size grows. In this article, we find similar differential equation ba… ▽ More Recent analyses of neural networks with shaped activations (i.e. the activation function is scaled as the network size grows) have led to scaling limits described by differential equations. However, these results do not a priori tell us anything about "ordinary" unshaped networks, where the activation is unchanged as the network size grows. In this article, we find similar differential equation based asymptotic characterization for two types of unshaped networks. Firstly, we show that the following two architectures converge to the same infinite-depth-and-width limit at initialization: (i) a fully connected ResNet with a $d^{-1/2}$ factor on the residual branch, where $d$ is the network depth. (ii) a multilayer perceptron (MLP) with depth $d \ll$ width $n$ and shaped ReLU activation at rate $d^{-1/2}$. Secondly, for an unshaped MLP at initialization, we derive the first order asymptotic correction to the layerwise correlation. In particular, if $ρ_\ell$ is the correlation at layer $\ell$, then $q_t = \ell^2 (1 - ρ_\ell)$ with $t = \frac{\ell}{n}$ converges to an SDE with a singularity at $t=0$. These results together provide a connection between shaped and unshaped network architectures, and opens up the possibility of studying the effect of normalization methods and how it connects with sha** activation functions. △ Less

Submitted 18 April, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

arXiv:2310.07973 [pdf, other]

Statistical Performance Guarantee for Subgroup Identification with Generic Machine Learning

Authors: Michael Lingzhi Li, Kosuke Imai

Abstract: Across a wide array of disciplines, many researchers use machine learning (ML) algorithms to identify a subgroup of individuals who are likely to benefit from a treatment the most (``exceptional responders'') or those who are harmed by it. A common approach to this subgroup identification problem consists of two steps. First, researchers estimate the conditional average treatment effect (CATE) usi… ▽ More Across a wide array of disciplines, many researchers use machine learning (ML) algorithms to identify a subgroup of individuals who are likely to benefit from a treatment the most (``exceptional responders'') or those who are harmed by it. A common approach to this subgroup identification problem consists of two steps. First, researchers estimate the conditional average treatment effect (CATE) using an ML algorithm. Next, they use the estimated CATE to select those individuals who are predicted to be most affected by the treatment, either positively or negatively. Unfortunately, CATE estimates are often biased and noisy. In addition, utilizing the same data to both identify a subgroup and estimate its group average treatment effect results in a multiple testing problem. To address these challenges, we develop uniform confidence bands for estimation of the group average treatment effect sorted by generic ML algorithm (GATES). Using these uniform confidence bands, researchers can identify, with a statistical guarantee, a subgroup whose GATES exceeds a certain effect size, regardless of how this effect size is chosen. The validity of the proposed methodology depends solely on randomization of treatment and random sampling of units. Importantly, our method does not require modeling assumptions and avoids a computationally intensive resampling procedure. A simulation study shows that the proposed uniform confidence bands are reasonably informative and have an appropriate empirical coverage even when the sample size is as small as 100. We analyze a clinical trial of late-stage prostate cancer and find a relatively large proportion of exceptional responders. △ Less

Submitted 20 December, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2309.16620 [pdf, other]

Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

Authors: Blake Bordelon, Lorenzo Noci, Mufan Bill Li, Boris Hanin, Cengiz Pehlevan

Abstract: The cost of hyperparameter tuning in deep learning has been rising with model sizes, prompting practitioners to find new tuning methods using a proxy of smaller networks. One such proposal uses $μ$P parameterized networks, where the optimal hyperparameters for small width networks transfer to networks with arbitrarily large width. However, in this scheme, hyperparameters do not transfer across dep… ▽ More The cost of hyperparameter tuning in deep learning has been rising with model sizes, prompting practitioners to find new tuning methods using a proxy of smaller networks. One such proposal uses $μ$P parameterized networks, where the optimal hyperparameters for small width networks transfer to networks with arbitrarily large width. However, in this scheme, hyperparameters do not transfer across depths. As a remedy, we study residual networks with a residual branch scale of $1/\sqrt{\text{depth}}$ in combination with the $μ$P parameterization. We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet. Furthermore, our empirical findings are supported and motivated by theory. Using recent developments in the dynamical mean field theory (DMFT) description of neural network learning dynamics, we show that this parameterization of ResNets admits a well-defined feature learning joint infinite-width and infinite-depth limit and show convergence of finite-size network dynamics towards this limit. △ Less

Submitted 8 December, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.11455 [pdf, other]

ddtlcm: An R package for overcoming weak separation in Bayesian latent class analysis via tree-regularization

Authors: Mengbing Li, Bolin Wu, Briana Stephenson, Zhenke Wu

Abstract: Traditional applications of latent class models (LCMs) often focus on scenarios where a set of unobserved classes are well-defined and easily distinguishable. However, in numerous real-world applications, these classes are weakly separated and difficult to distinguish, creating significant numerical challenges. To address these issues, we have developed an R package ddtlcm that provides comprehens… ▽ More Traditional applications of latent class models (LCMs) often focus on scenarios where a set of unobserved classes are well-defined and easily distinguishable. However, in numerous real-world applications, these classes are weakly separated and difficult to distinguish, creating significant numerical challenges. To address these issues, we have developed an R package ddtlcm that provides comprehensive analysis and visualization tools designed to enhance the robustness and interpretability of LCMs in the presence of weak class separation, particularly useful for small sample sizes. This package implements a tree-regularized Bayesian LCM that leverages statistical strength between latent classes to make better estimates using limited data. A Shiny app has also been developed to improve user interactivity. In this paper, we showcase a typical analysis pipeline with simulated data using ddtlcm. All software has been made publicly available on CRAN and GitHub. △ Less

Submitted 20 September, 2023; originally announced September 2023.

arXiv:2308.14864 [pdf, other]

NAS-X: Neural Adaptive Smoothing via Twisting

Authors: Dieterich Lawson, Michael Li, Scott Linderman

Abstract: Sequential latent variable models (SLVMs) are essential tools in statistics and machine learning, with applications ranging from healthcare to neuroscience. As their flexibility increases, analytic inference and model learning can become challenging, necessitating approximate methods. Here we introduce neural adaptive smoothing via twisting (NAS-X), a method that extends reweighted wake-sleep (RWS… ▽ More Sequential latent variable models (SLVMs) are essential tools in statistics and machine learning, with applications ranging from healthcare to neuroscience. As their flexibility increases, analytic inference and model learning can become challenging, necessitating approximate methods. Here we introduce neural adaptive smoothing via twisting (NAS-X), a method that extends reweighted wake-sleep (RWS) to the sequential setting by using smoothing sequential Monte Carlo (SMC) to estimate intractable posterior expectations. Combining RWS and smoothing SMC allows NAS-X to provide low-bias and low-variance gradient estimates, and fit both discrete and continuous latent variable models. We illustrate the theoretical advantages of NAS-X over previous methods and explore these advantages empirically in a variety of tasks, including a challenging application to mechanistic models of neuronal dynamics. These experiments show that NAS-X substantially outperforms previous VI- and RWS-based methods in inference and model learning, achieving lower parameter error and tighter likelihood bounds. △ Less

Submitted 30 October, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: Updating for clarity and adding new baselines

arXiv:2308.13905 [pdf, ps, other]

Estimation and Hypothesis Testing of Derivatives in Smoothing Spline ANOVA Models

Authors: Ruiqi Liu, Kexuan Li, Meng Li

Abstract: This article studies the derivatives in models that flexibly characterize the relationship between a response variable and multiple predictors, with goals of providing both accurate estimation and inference procedures for hypothesis testing. In the setting of tensor product reproducing spaces for nonparametric multivariate functions, we propose a plug-in kernel ridge regression estimator to estima… ▽ More This article studies the derivatives in models that flexibly characterize the relationship between a response variable and multiple predictors, with goals of providing both accurate estimation and inference procedures for hypothesis testing. In the setting of tensor product reproducing spaces for nonparametric multivariate functions, we propose a plug-in kernel ridge regression estimator to estimate the derivatives of the underlying multivariate regression function under the smoothing spline ANOVA model. This estimator has an analytical form, making it simple to implement in practice. We first establish $L_\infty$ and $L_2$ convergence rates of the proposed estimator under general random designs. For derivatives with some selected interesting orders, we provide an in-depth analysis establishing the minimax lower bound, which matches the $L_2$ convergence rate. Additionally, motivated by a wide range of applications, we propose a hypothesis testing procedure to examine whether a derivative is zero. Theoretical results demonstrate that the proposed testing procedure achieves the correct size under the null hypothesis and is asymptotically powerful under local alternatives. For ease of use, we also develop an associated bootstrap algorithm to construct the rejection region and calculate the p-value, and the consistency of the proposed algorithm is established. Simulation studies using synthetic data and an application to a real-world dataset confirm the effectiveness of our methods. △ Less

Submitted 26 August, 2023; originally announced August 2023.

arXiv:2307.02126 [pdf, other]

Robust Graph Structure Learning with the Alignment of Features and Adjacency Matrix

Authors: Shaogao Lv, Gang Wen, Shiyu Liu, Linsen Wei, Ming Li

Abstract: To improve the robustness of graph neural networks (GNN), graph structure learning (GSL) has attracted great interest due to the pervasiveness of noise in graph data. Many approaches have been proposed for GSL to jointly learn a clean graph structure and corresponding representations. To extend the previous work, this paper proposes a novel regularized GSL approach, particularly with an alignment… ▽ More To improve the robustness of graph neural networks (GNN), graph structure learning (GSL) has attracted great interest due to the pervasiveness of noise in graph data. Many approaches have been proposed for GSL to jointly learn a clean graph structure and corresponding representations. To extend the previous work, this paper proposes a novel regularized GSL approach, particularly with an alignment of feature information and graph information, which is motivated mainly by our derived lower bound of node-level Rademacher complexity for GNNs. Additionally, our proposed approach incorporates sparse dimensional reduction to leverage low-dimensional node features that are relevant to the graph structure. To evaluate the effectiveness of our approach, we conduct experiments on real-world graphs. The results demonstrate that our proposed GSL method outperforms several competitive baselines, especially in scenarios where the graph structures are heavily affected by noise. Overall, our research highlights the importance of integrating feature and graph information alignment in GSL, as inspired by our derived theoretical result, and showcases the superiority of our approach in handling noisy graph structures through comprehensive experiments on real-world datasets. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2307.01224 [pdf, other]

INGB: Informed Nonlinear Granular Ball Oversampling Framework for Noisy Imbalanced Classification

Authors: Min Li, Hao Zhou, Qun Liu, Yabin Shao, Guoying Wang

Abstract: In classification problems, the datasets are usually imbalanced, noisy or complex. Most sampling algorithms only make some improvements to the linear sampling mechanism of the synthetic minority oversampling technique (SMOTE). Nevertheless, linear oversampling has several unavoidable drawbacks. Linear oversampling is susceptible to overfitting, and the synthetic samples lack diversity and rarely a… ▽ More In classification problems, the datasets are usually imbalanced, noisy or complex. Most sampling algorithms only make some improvements to the linear sampling mechanism of the synthetic minority oversampling technique (SMOTE). Nevertheless, linear oversampling has several unavoidable drawbacks. Linear oversampling is susceptible to overfitting, and the synthetic samples lack diversity and rarely account for the original distribution characteristics. An informed nonlinear oversampling framework with the granular ball (INGB) as a new direction of oversampling is proposed in this paper. It uses granular balls to simulate the spatial distribution characteristics of datasets, and informed entropy is utilized to further optimize the granular-ball space. Then, nonlinear oversampling is performed by following high-dimensional sparsity and the isotropic Gaussian distribution. Furthermore, INGB has good compatibility. Not only can it be combined with most SMOTE-based sampling algorithms to improve their performance, but it can also be easily extended to noisy imbalanced multi-classification problems. The mathematical model and theoretical proof of INGB are given in this work. Extensive experiments demonstrate that INGB outperforms the traditional linear sampling frameworks and algorithms in oversampling on complex datasets. △ Less

Submitted 2 July, 2023; originally announced July 2023.

Comments: 15 pages, 6 figures

arXiv:2306.17759 [pdf, other]

The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit

Authors: Lorenzo Noci, Chuning Li, Mufan Bill Li, Bobby He, Thomas Hofmann, Chris Maddison, Daniel M. Roy

Abstract: In deep learning theory, the covariance matrix of the representations serves as a proxy to examine the network's trainability. Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with skip connections in the proportional limit of infinite-depth-and-width. We show that at initialization the limiting distribution can be described by a… ▽ More In deep learning theory, the covariance matrix of the representations serves as a proxy to examine the network's trainability. Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with skip connections in the proportional limit of infinite-depth-and-width. We show that at initialization the limiting distribution can be described by a stochastic differential equation (SDE) indexed by the depth-to-width ratio. To achieve a well-defined stochastic limit, the Transformer's attention mechanism is modified by centering the Softmax output at identity, and scaling the Softmax logits by a width-dependent temperature parameter. We examine the stability of the network through the corresponding SDE, showing how the scale of both the drift and diffusion can be elegantly controlled with the aid of residual connections. The existence of a stable SDE implies that the covariance structure is well-behaved, even for very large depth and width, thus preventing the notorious issues of rank degeneracy in deep attention models. Finally, we show, through simulations, that the SDE provides a surprisingly good description of the corresponding finite-size model. We coin the name shaped Transformer for these architectural modifications. △ Less

Submitted 9 December, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

arXiv:2306.06581 [pdf, other]

Importance Sparsification for Sinkhorn Algorithm

Authors: Mengyu Li, Jun Yu, Tao Li, Cheng Meng

Abstract: Sinkhorn algorithm has been used pervasively to approximate the solution to optimal transport (OT) and unbalanced optimal transport (UOT) problems. However, its practical application is limited due to the high computational complexity. To alleviate the computational burden, we propose a novel importance sparsification method, called Spar-Sink, to efficiently approximate entropy-regularized OT and… ▽ More Sinkhorn algorithm has been used pervasively to approximate the solution to optimal transport (OT) and unbalanced optimal transport (UOT) problems. However, its practical application is limited due to the high computational complexity. To alleviate the computational burden, we propose a novel importance sparsification method, called Spar-Sink, to efficiently approximate entropy-regularized OT and UOT solutions. Specifically, our method employs natural upper bounds for unknown optimal transport plans to establish effective sampling probabilities, and constructs a sparse kernel matrix to accelerate Sinkhorn iterations, reducing the computational cost of each iteration from $O(n^2)$ to $\widetilde{O}(n)$ for a sample of size $n$. Theoretically, we show the proposed estimators for the regularized OT and UOT problems are consistent under mild regularity conditions. Experiments on various synthetic data demonstrate Spar-Sink outperforms mainstream competitors in terms of both estimation error and speed. A real-world echocardiogram data analysis shows Spar-Sink can effectively estimate and visualize cardiac cycles, from which one can identify heart failure and arrhythmia. To evaluate the numerical accuracy of cardiac cycle prediction, we consider the task of predicting the end-systole time point using the end-diastole one. Results show Spar-Sink performs as well as the classical Sinkhorn algorithm, requiring significantly less computational time. △ Less

Submitted 11 June, 2023; originally announced June 2023.

Comments: Accepted by Journal of Machine Learning Research

arXiv:2306.04700 [pdf, other]

Tree-Regularized Bayesian Latent Class Analysis for Improving Weakly Separated Dietary Pattern Subty** in Small-Sized Subpopulations

Authors: Mengbing Li, Briana Stephenson, Zhenke Wu

Abstract: Dietary patterns synthesize multiple related diet components, which can be used by nutrition researchers to examine diet-disease relationships. Latent class models (LCMs) have been used to derive dietary patterns from dietary intake assessment, where each class profile represents the probabilities of exposure to a set of diet components. However, LCM-derived dietary patterns can exhibit strong sim… ▽ More Dietary patterns synthesize multiple related diet components, which can be used by nutrition researchers to examine diet-disease relationships. Latent class models (LCMs) have been used to derive dietary patterns from dietary intake assessment, where each class profile represents the probabilities of exposure to a set of diet components. However, LCM-derived dietary patterns can exhibit strong similarities, or weak separation, resulting in numerical and inferential instabilities that challenge scientific interpretation. This issue is exacerbated in small-sized subpopulations. To address these issues, we provide a simple solution that empowers LCMs to improve dietary pattern estimation. We develop a tree-regularized Bayesian LCM that shares statistical strength between dietary patterns to make better estimates using limited data. This is achieved via a Dirichlet diffusion tree process that specifies a prior distribution for the unknown tree over classes. Dietary patterns that share proximity to one another in the tree are shrunk towards ancestral dietary patterns a priori, with the degree of shrinkage varying across pre-specified food groups. Using dietary intake data from the Hispanic Community Health Study/Study of Latinos, we apply the proposed approach to a sample of 496 US adults of South American ethnic background to identify and compare dietary patterns. △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2306.02831 [pdf, other]

MM-DAG: Multi-task DAG Learning for Multi-modal Data -- with Application for Traffic Congestion Analysis

Authors: Tian Lan, Ziyue Li, Zhishuai Li, Lei Bai, Man Li, Fugee Tsung, Wolfgang Ketter, Rui Zhao, Chen Zhang

Abstract: This paper proposes to learn Multi-task, Multi-modal Direct Acyclic Graphs (MM-DAGs), which are commonly observed in complex systems, e.g., traffic, manufacturing, and weather systems, whose variables are multi-modal with scalars, vectors, and functions. This paper takes the traffic congestion analysis as a concrete case, where a traffic intersection is usually regarded as a DAG. In a road network… ▽ More This paper proposes to learn Multi-task, Multi-modal Direct Acyclic Graphs (MM-DAGs), which are commonly observed in complex systems, e.g., traffic, manufacturing, and weather systems, whose variables are multi-modal with scalars, vectors, and functions. This paper takes the traffic congestion analysis as a concrete case, where a traffic intersection is usually regarded as a DAG. In a road network of multiple intersections, different intersections can only have some overlap** and distinct variables observed. For example, a signalized intersection has traffic light-related variables, whereas unsignalized ones do not. This encourages the multi-task design: with each DAG as a task, the MM-DAG tries to learn the multiple DAGs jointly so that their consensus and consistency are maximized. To this end, we innovatively propose a multi-modal regression for linear causal relationship description of different variables. Then we develop a novel Causality Difference (CD) measure and its differentiable approximator. Compared with existing SOTA measures, CD can penalize the causal structural difference among DAGs with distinct nodes and can better consider the uncertainty of causal orders. We rigidly prove our design's topological interpretation and consistency properties. We conduct thorough simulations and one case study to show the effectiveness of our MM-DAG. The code is available under https://github.com/Lantian72/MM-DAG △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: Accepted in SIGKDD 2023

arXiv:2305.18987 [pdf, other]

Robust mean change point testing in high-dimensional data with heavy tails

Authors: Mengchu Li, Yudong Chen, Tengyao Wang, Yi Yu

Abstract: We study a mean change point testing problem for high-dimensional data, with exponentially- or polynomially-decaying tails. In each case, depending on the $\ell_0$-norm of the mean change vector, we separately consider dense and sparse regimes. We characterise the boundary between the dense and sparse regimes under the above two tail conditions for the first time in the change point literature and… ▽ More We study a mean change point testing problem for high-dimensional data, with exponentially- or polynomially-decaying tails. In each case, depending on the $\ell_0$-norm of the mean change vector, we separately consider dense and sparse regimes. We characterise the boundary between the dense and sparse regimes under the above two tail conditions for the first time in the change point literature and propose novel testing procedures that attain optimal rates in each of the four regimes up to a poly-iterated logarithmic factor. By comparing with previous results under Gaussian assumptions, our results quantify the costs of heavy-tailedness on the fundamental difficulty of change point testing problems for high-dimensional data. To be specific, when the error vectors follow sub-Weibull distributions, a CUSUM-type statistic is shown to achieve a minimax testing rate up to $\sqrt{\log\log(8n)}$. When the error distributions have polynomially-decaying tails, admitting bounded $α$-th moments for some $α\geq 4$, we introduce a median-of-means-type test statistic that achieves a near-optimal testing rate in both dense and sparse regimes. In particular, in the sparse regime, we further propose a computationally-efficient test to achieve the exact optimality. Surprisingly, our investigation in the even more challenging case of $2 \leq α< 4$, unveils a new phenomenon that the minimax testing rate has no sparse regime, i.e.\ testing sparse changes is information-theoretically as hard as testing dense changes. This phenomenon implies a phase transition of the minimax testing rates at $α= 4$. △ Less

Submitted 17 June, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

Comments: 51 pages, 1 figure

arXiv:2305.12085 [pdf, other]

Stability and Generalization of lp-Regularized Stochastic Learning for GCN

Authors: Shiyu Liu, Linsen Wei, Shaogao Lv, Ming Li

Abstract: Graph convolutional networks (GCN) are viewed as one of the most popular representations among the variants of graph neural networks over graph data and have shown powerful performance in empirical experiments. That $\ell_2$-based graph smoothing enforces the global smoothness of GCN, while (soft) $\ell_1$-based sparse graph learning tends to promote signal sparsity to trade for discontinuity. Thi… ▽ More Graph convolutional networks (GCN) are viewed as one of the most popular representations among the variants of graph neural networks over graph data and have shown powerful performance in empirical experiments. That $\ell_2$-based graph smoothing enforces the global smoothness of GCN, while (soft) $\ell_1$-based sparse graph learning tends to promote signal sparsity to trade for discontinuity. This paper aims to quantify the trade-off of GCN between smoothness and sparsity, with the help of a general $\ell_p$-regularized $(1<p\leq 2)$ stochastic learning proposed within. While stability-based generalization analyses have been given in prior work for a second derivative objectiveness function, our $\ell_p$-regularized learning scheme does not satisfy such a smooth condition. To tackle this issue, we propose a novel SGD proximal algorithm for GCNs with an inexact operator. For a single-layer GCN, we establish an explicit theoretical understanding of GCN with the $\ell_p$-regularized stochastic learning by analyzing the stability of our SGD proximal algorithm. We conduct multiple empirical experiments to validate our theoretical findings. △ Less

Submitted 19 June, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: Accepted to IJCAI 2023

arXiv:2305.06172 [pdf, other]

Principal Feature Detection via $Φ$-Sobolev Inequalities

Authors: Matthew T. C. Li, Youssef Marzouk, Olivier Zahm

Abstract: We investigate the approximation of high-dimensional target measures as low-dimensional updates of a dominating reference measure. This approximation class replaces the associated density with the composition of: (i) a feature map that identifies the leading principal components or features of the target measure, relative to the reference, and (ii) a low-dimensional profile function. When the refe… ▽ More We investigate the approximation of high-dimensional target measures as low-dimensional updates of a dominating reference measure. This approximation class replaces the associated density with the composition of: (i) a feature map that identifies the leading principal components or features of the target measure, relative to the reference, and (ii) a low-dimensional profile function. When the reference measure satisfies a subspace $φ$-Sobolev inequality, we construct a computationally tractable approximation that yields certifiable error guarantees with respect to the Amari $α$-divergences. Our construction proceeds in two stages. First, for any feature map and any $α$-divergence, we obtain an analytical expression for the optimal profile function. Second, for linear feature maps, the principal features are obtained from eigenvectors of a matrix involving gradients of the log-density. Neither step requires explicit access to normalizing constants. Notably, by leveraging the $φ$-Sobolev inequalities, we demonstrate that these features universally certify approximation errors across the range of $α$-divergences $α\in (0,1]$. We then propose an application to Bayesian inverse problems and provide an analogous construction with approximation guarantees that hold in expectation over the data. We conclude with an extension of the proposed dimension reduction strategy to nonlinear feature maps. △ Less

Submitted 16 January, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: To appear in Bernoulli, but this version contains both the main file and the supplementary material

arXiv:2305.01789 [pdf, other]

Multivariate Intrinsic Local Polynomial Regression on Isometric Riemannian Manifolds: Applications to Positive Definite Data

Authors: Ronaldo García Reyes, Ying Wang, Min Li, Marlis Ontiviero Ortega, Deirel Paz-Linares, Lídice Galán García, Pedro Antonio Valdez Sosa

Abstract: The paper introduces a novel non-parametric Riemannian regression method using Isometric Riemannian Manifolds (IRMs). The proposed technique, Intrinsic Local Polynomial Regression on IRMs (ILPR-IRMs), enables global data map** between Riemannian manifolds while preserving underlying geometries. The ILPR method is generalized to handle multivariate covariates on any Riemannian manifold and isomet… ▽ More The paper introduces a novel non-parametric Riemannian regression method using Isometric Riemannian Manifolds (IRMs). The proposed technique, Intrinsic Local Polynomial Regression on IRMs (ILPR-IRMs), enables global data map** between Riemannian manifolds while preserving underlying geometries. The ILPR method is generalized to handle multivariate covariates on any Riemannian manifold and isometry. Specifically, for manifolds equipped with Euclidean Pullback Metrics (EPMs), a closed analytical formula is derived for the multivariate ILPR (ILPR-EPM). Asymptotic statistical properties of the ILPR-EPM for the multivariate local linear case are established, including a formula for the asymptotic bias, establishing estimator consistency. The paper showcases possible applications of the method by focusing on a group of Riemannian metrics on the Symmetric Positive Definite (SPD) manifold, which arises in machine learning and neuroscience. It is demonstrated that several metrics on the SPD manifold are EPMs, resulting in a closed analytical expression for the multivariate ILPR estimator on the SPD manifold. The paper evaluates the ILPR estimator's performance under two specific EPMs, Log-Cholesky and Log-Euclidean, on simulated data on the SPD manifold and compares it with extrinsic LPR using the Affine-Invariant when scaling the manifold and covariate dimension. The results show that the ILPR using the Log-Cholesky metric is computationally faster and provides a better trade-off between error and time than other metrics. Finally, the Log-Cholesky metric on the SPD manifold is employed to implement an efficient and intrinsic version of Rie-SNE for visualizing high-dimensional SPD data. The code for implementing ILPR-EPMs and other relevant calculations is available on the GitHub page. △ Less

Submitted 2 May, 2023; originally announced May 2023.

Comments: 32 pages, 6 figures, 1 pseudocode

MSC Class: 53B12; 62R30; 62G05; 6208; 6211

arXiv:2304.11894 [pdf, other]

Estimating Failure Probability with Neural Operator Hybrid Approach

Authors: Mu**g Li, Yani Feng, Guanjie Wang

Abstract: Evaluating failure probability for complex engineering systems is a computationally intensive task. While the Monte Carlo method is easy to implement, it converges slowly and, hence, requires numerous repeated simulations of a complex system to generate sufficient samples. To improve the efficiency, methods based on surrogate models are proposed to approximate the limit state function. In this wor… ▽ More Evaluating failure probability for complex engineering systems is a computationally intensive task. While the Monte Carlo method is easy to implement, it converges slowly and, hence, requires numerous repeated simulations of a complex system to generate sufficient samples. To improve the efficiency, methods based on surrogate models are proposed to approximate the limit state function. In this work, we reframe the approximation of the limit state function as an operator learning problem and utilize the DeepONet framework with a hybrid approach to estimate the failure probability. The numerical results show that our proposed method outperforms the prior neural hybrid method. △ Less

Submitted 25 June, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

arXiv:2303.00288

The Race of mRNA therapy: Evidence from Patent Landscape

Authors: Jianxiong Ren, Xiaoming Zhang, Xingyong Si, Xiangjun Kong, **yu Cong, **** Wang, Xiang Li, Qianru Zhang, Peifen Yao, Mengyao Li, Yuanqi Cai, Zhaocai Sun, Kunmeng Liu, Benzheng Wei

Abstract: mRNA therapy is gaining worldwide attention as an emerging therapeutic approach. The widespread use of mRNA vaccines during the COVID-19 outbreak has demonstrated the potential of mRNA therapy. As mRNA-based drugs have expanded and their indications have broadened, more patents for mRNA innovations have emerged. The global patent landscape for mRNA therapy has not yet been analyzed, indicating a r… ▽ More mRNA therapy is gaining worldwide attention as an emerging therapeutic approach. The widespread use of mRNA vaccines during the COVID-19 outbreak has demonstrated the potential of mRNA therapy. As mRNA-based drugs have expanded and their indications have broadened, more patents for mRNA innovations have emerged. The global patent landscape for mRNA therapy has not yet been analyzed, indicating a research gap in need of filling, from new technology to productization. This study uses social network analysis with the patent quality assessment to investigate the temporal trends, citation relationship, and significant litigation for 16,101 mRNA therapy patents and summarizes the hot topics and potential future directions for this industry. The information obtained in this study not only may be utilized as a tool of knowledge for researchers in a comprehensive and integrated way but can also provide inspiration for efficient production methods for mRNA drugs. This study shows that infectious diseases and cancer are currently the primary applications for mRNA drugs. Emerging patent activity and lawsuits in this field are demonstrating that delivery technology remains one of the key challenges in the field and that drug-targeting research in combination with vector technology will be one of the major directions for the industry going forward. With significant funding, new organizations have developed novel delivery technologies in an attempt to break into the patent thicket established by companies such as Arbutus. The global mRNA therapeutic landscape is undergoing a multifaceted development pattern, and the monopoly of giant companies is being challenged. △ Less

Submitted 15 March, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: I have received requests from co-authors and funding agencies to withdraw the manuscript

arXiv:2302.08049 [pdf, ps, other]

Improved Discretization Analysis for Underdamped Langevin Monte Carlo

Authors: Matthew Zhang, Sinho Chewi, Mufan Bill Li, Krishnakumar Balasubramanian, Murat A. Erdogdu

Abstract: Underdamped Langevin Monte Carlo (ULMC) is an algorithm used to sample from unnormalized densities by leveraging the momentum of a particle moving in a potential well. We provide a novel analysis of ULMC, motivated by two central questions: (1) Can we obtain improved sampling guarantees beyond strong log-concavity? (2) Can we achieve acceleration for sampling? For (1), prior results for ULMC onl… ▽ More Underdamped Langevin Monte Carlo (ULMC) is an algorithm used to sample from unnormalized densities by leveraging the momentum of a particle moving in a potential well. We provide a novel analysis of ULMC, motivated by two central questions: (1) Can we obtain improved sampling guarantees beyond strong log-concavity? (2) Can we achieve acceleration for sampling? For (1), prior results for ULMC only hold under a log-Sobolev inequality together with a restrictive Hessian smoothness condition. Here, we relax these assumptions by removing the Hessian smoothness condition and by considering distributions satisfying a Poincaré inequality. Our analysis achieves the state of art dimension dependence, and is also flexible enough to handle weakly smooth potentials. As a byproduct, we also obtain the first KL divergence guarantees for ULMC without Hessian smoothness under strong log-concavity, which is based on a new result on the log-Sobolev constant along the underdamped Langevin diffusion. For (2), the recent breakthrough of Cao, Lu, and Wang (2020) established the first accelerated result for sampling in continuous time via PDE methods. Our discretization analysis translates their result into an algorithmic guarantee, which indeed enjoys better condition number dependence than prior works on ULMC, although we leave open the question of full acceleration in discrete time. Both (1) and (2) necessitate Rényi discretization bounds, which are more challenging than the typically used Wasserstein coupling arguments. We address this using a flexible discretization analysis based on Girsanov's theorem that easily extends to more general settings. △ Less

Submitted 15 February, 2023; originally announced February 2023.

arXiv:2302.04438 [pdf, other]

An information-theoretic learning model based on importance sampling

Authors: Jiangshe Zhang, Lizhen Ji, Fei Gao, Mengyao Li

Abstract: A crucial assumption underlying the most current theory of machine learning is that the training distribution is identical to the test distribution. However, this assumption may not hold in some real-world applications. In this paper, we develop a learning model based on principles of information theory by minimizing the worst-case loss at prescribed levels of uncertainty. We reformulate the empir… ▽ More A crucial assumption underlying the most current theory of machine learning is that the training distribution is identical to the test distribution. However, this assumption may not hold in some real-world applications. In this paper, we develop a learning model based on principles of information theory by minimizing the worst-case loss at prescribed levels of uncertainty. We reformulate the empirical estimation of the risk functional and the distribution deviation constraint based on the importance sampling method. The objective of the proposed approach is to minimize the loss under maximum degradation and hence the resulting problem is a minimax problem which can be converted to an unconstrained minimum problem using the Lagrange method with the Lagrange multiplier $T$. We reveal that the minimization of the objective function under logarithmic transformation is equivalent to the minimization of the p-norm loss with $p=\frac{1}{T}$. We applied the proposed model to the face verification task on Racial Faces in the Wild datasets and showed that the proposed model performs better under large distribution deviations. △ Less

Submitted 22 February, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

Comments: 7 pages, 4 figures

arXiv:2302.02895 [pdf, other]

Flexible and Probabilistic Topology Tracking with Partial Optimal Transport

Authors: Mingzhe Li, Xinyuan Yan, Lin Yan, Tom Needham, Bei Wang

Abstract: In this paper, we present a flexible and probabilistic framework for tracking topological features in time-varying scalar fields using merge trees and partial optimal transport. Merge trees are topological descriptors that record the evolution of connected components in the sublevel sets of scalar fields. We present a new technique for modeling and comparing merge trees using tools from partial op… ▽ More In this paper, we present a flexible and probabilistic framework for tracking topological features in time-varying scalar fields using merge trees and partial optimal transport. Merge trees are topological descriptors that record the evolution of connected components in the sublevel sets of scalar fields. We present a new technique for modeling and comparing merge trees using tools from partial optimal transport. In particular, we model a merge tree as a measure network, that is, a network equipped with a probability distribution, and define a notion of distance on the space of merge trees inspired by partial optimal transport. Such a distance offers a new and flexible perspective for encoding intrinsic and extrinsic information in the comparative measures of merge trees. More importantly, it gives rise to a partial matching between topological features in time-varying data, thus enabling flexible topology tracking for scientific simulations. Furthermore, such partial matching may be interpreted as probabilistic coupling between features at adjacent time steps, which gives rise to probabilistic tracking graphs. We derive a stability result for our distance and provide numerous experiments indicating the efficacy of distance in extracting meaningful feature tracks. △ Less

Submitted 6 February, 2023; originally announced February 2023.

arXiv:2212.01998 [pdf]

An operational framework to automatically evaluate the quality of weather observations from third-party stations

Authors: Quanxi Shao, Ming Li, Joel Janek Dabrowski, Shuvo Bakar, Ashfaqur Rahman, Andrea Powell, Brent Henderson

Abstract: With increasing number of crowdsourced private automatic weather stations (called TPAWS) established to fill the gap of official network and obtain local weather information for various purposes, the data quality is a major concern in promoting their usage. Proper quality control and assessment are necessary to reach mutual agreement on the TPAWS observations. To derive near real-time assessment f… ▽ More With increasing number of crowdsourced private automatic weather stations (called TPAWS) established to fill the gap of official network and obtain local weather information for various purposes, the data quality is a major concern in promoting their usage. Proper quality control and assessment are necessary to reach mutual agreement on the TPAWS observations. To derive near real-time assessment for operational system, we propose a simple, scalable and interpretable framework based on AI/Stats/ML models. The framework constructs separate models for individual data from official sources and then provides the final assessment by fusing the individual models. The performance of our proposed framework is evaluated by synthetic data and demonstrated by applying it to a re-al TPAWS network. △ Less

Submitted 4 December, 2022; originally announced December 2022.

Comments: 9 pages, 2 figures, AI4 Environment conference

arXiv:2211.04528 [pdf, other]

Quality Control in Weather Monitoring with Dynamic Linear Models

Authors: Joel Janek Dabrowski, Ashfaqur Rahman, Ming Li, Quanxi Shao, Shuvo Bakar, Andrea Powell, Brent Henderson

Abstract: Decisions in agriculture are frequently based on weather. With an increase in the availability and affordability of off-the-shelf weather stations, farmers able to acquire localised weather information. However, with uncertainty in the sensor and installation quality, farmers are at risk of making poor decisions based on incorrect data. We present an automated approach to perform quality control o… ▽ More Decisions in agriculture are frequently based on weather. With an increase in the availability and affordability of off-the-shelf weather stations, farmers able to acquire localised weather information. However, with uncertainty in the sensor and installation quality, farmers are at risk of making poor decisions based on incorrect data. We present an automated approach to perform quality control on weather sensors. Our approach uses time-series modelling and data fusion with Bayesian principles to provide predictions with uncertainty quantification. These predictions and uncertainty are used to estimate the validity of a sensor observation. We test on temperature, wind, and humidity data and achieve error hit rates above 80% and false negative rates below 11%. △ Less

Submitted 2 March, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

Comments: Published in The 2nd AAAI Workshop on AI for Agriculture and Food Systems, 2023

arXiv:2211.03983 [pdf, other]

Doubly Inhomogeneous Reinforcement Learning

Authors: Liyuan Hu, Mengbing Li, Chengchun Shi, Zhenke Wu, Piotr Fryzlewicz

Abstract: This paper studies reinforcement learning (RL) in doubly inhomogeneous environments under temporal non-stationarity and subject heterogeneity. In a number of applications, it is commonplace to encounter datasets generated by system dynamics that may change over time and population, challenging high-quality sequential decision making. Nonetheless, most existing RL solutions require either temporal… ▽ More This paper studies reinforcement learning (RL) in doubly inhomogeneous environments under temporal non-stationarity and subject heterogeneity. In a number of applications, it is commonplace to encounter datasets generated by system dynamics that may change over time and population, challenging high-quality sequential decision making. Nonetheless, most existing RL solutions require either temporal stationarity or subject homogeneity, which would result in sub-optimal policies if both assumptions were violated. To address both challenges simultaneously, we propose an original algorithm to determine the ``best data chunks" that display similar dynamics over time and across individuals for policy learning, which alternates between most recent change point detection and cluster identification. Our method is general, and works with a wide range of clustering and change point detection algorithms. It is multiply robust in the sense that it takes multiple initial estimators as input and only requires one of them to be consistent. Moreover, by borrowing information over time and population, it allows us to detect weaker signals and has better convergence properties when compared to applying the clustering algorithm per time or the change point detection algorithm per subject. Empirically, we demonstrate the usefulness of our method through extensive simulations and a real data application. △ Less

Submitted 12 November, 2022; v1 submitted 7 November, 2022; originally announced November 2022.

arXiv:2210.08326 [pdf, ps, other]

Distributionally Robust Causal Inference with Observational Data

Authors: Dimitris Bertsimas, Kosuke Imai, Michael Lingzhi Li

Abstract: We consider the estimation of average treatment effects in observational studies and propose a new framework of robust causal inference with unobserved confounders. Our approach is based on distributionally robust optimization and proceeds in two steps. We first specify the maximal degree to which the distribution of unobserved potential outcomes may deviate from that of observed outcomes. We then… ▽ More We consider the estimation of average treatment effects in observational studies and propose a new framework of robust causal inference with unobserved confounders. Our approach is based on distributionally robust optimization and proceeds in two steps. We first specify the maximal degree to which the distribution of unobserved potential outcomes may deviate from that of observed outcomes. We then derive sharp bounds on the average treatment effects under this assumption. Our framework encompasses the popular marginal sensitivity model as a special case, and we demonstrate how the proposed methodology can address a primary challenge of the marginal sensitivity model that it produces uninformative results when unobserved confounders substantially affect treatment and outcome. Specifically, we develop an alternative sensitivity model, called the distributional sensitivity model, under the assumption that heterogeneity of treatment effect due to unobserved variables is relatively small. Unlike the marginal sensitivity model, the distributional sensitivity model allows for potential lack of overlap and often produces informative bounds even when unobserved variables substantially affect both treatment and outcome. Finally, we show how to extend the distributional sensitivity model to difference-in-differences designs and settings with instrumental variables. Through simulation and empirical studies, we demonstrate the applicability of the proposed methodology. △ Less

Submitted 2 February, 2023; v1 submitted 15 October, 2022; originally announced October 2022.

arXiv:2209.11805 [pdf]

Tracking the State and Behavior of People in Response to COVID-1 19 Through the Fusion of Multiple Longitudinal Data Streams

Authors: Mohamed Amine Bouzaghrane, Hassan Obeid, Drake Hayes, Minnie Chen, Meiqing Li, Madeleine Parker, Daniel A. Rodríguez, Daniel G. Chatman, Karen Trapenberg Frick, Raja Sengupta, Joan Walker

Abstract: The changing nature of the COVID-19 pandemic has highlighted the importance of comprehensively considering its impacts and considering changes over time. Most COVID-19 related research addresses narrowly focused research questions and is therefore limited in addressing the complexities created by the interrelated impacts of the pandemic. Such research generally makes use of only one of either 1) a… ▽ More The changing nature of the COVID-19 pandemic has highlighted the importance of comprehensively considering its impacts and considering changes over time. Most COVID-19 related research addresses narrowly focused research questions and is therefore limited in addressing the complexities created by the interrelated impacts of the pandemic. Such research generally makes use of only one of either 1) actively collected data such as surveys, or 2) passively collected data. While a few studies make use of both actively and passively collected data, only one other study collects it longitudinally. Here we describe a rich panel dataset of active and passive data from U.S. residents collected between August 2020 and July 2021. Active data includes a repeated survey measuring travel behavior, compliance with COVID-19 mandates, physical health, economic well-being, vaccination status, and other factors. Passively collected data consists of all locations visited by study participants, taken from smartphone GPS data. We also closely tracked COVID-19 policies across counties of residence throughout the study period. Such a dataset allows important research questions to be answered; for example, to determine the factors underlying the heterogeneous behavioral responses to COVID-19 restrictions imposed by local governments. Better information about such responses is critical to our ability to understand the societal and economic impacts of this and future pandemics. The development of this data infrastructure can also help researchers explore new frontiers in behavioral science. The article explains how this approach fills gaps in COVID-19 related data collection; describes the study design and data collection procedures; presents key demographic characteristics of study participants; and shows how fusing different data streams helps uncover behavioral insights. △ Less

Submitted 1 October, 2022; v1 submitted 23 September, 2022; originally announced September 2022.

arXiv:2209.11334 [pdf, other]

Evaluating undercounts in epidemics: response to Maruotti et al. 2022

Authors: Michael Li, Jonathan Dushoff, David J. D. Earn, Benjamin M. Bolker

Abstract: Maruotti et al. 2022 used a mark-recapture approach to estimate bounds on the true number of monkeypox infections in various countries. These approaches are fundamentally flawed; it is impossible to estimate undercounting based solely on a single stream of reported cases. Simulations based on a Richards curve for cumulative incidence show that, for reasonable epidemic parameters, the proposed meth… ▽ More Maruotti et al. 2022 used a mark-recapture approach to estimate bounds on the true number of monkeypox infections in various countries. These approaches are fundamentally flawed; it is impossible to estimate undercounting based solely on a single stream of reported cases. Simulations based on a Richards curve for cumulative incidence show that, for reasonable epidemic parameters, the proposed methods estimate bounds on the ascertainment ratio of $\approx 0.2-0.5$ roughly independently of the true ascertainment ratio. These methods should not be used. △ Less

Submitted 22 September, 2022; originally announced September 2022.

Showing 1–50 of 210 results for author: Li, M