Search | arXiv e-print repository

Stable Heterogeneous Treatment Effect Estimation across Out-of-Distribution Populations

Authors: Yuling Zhang, Anpeng Wu, Kun Kuang, Liang Du, Zixun Sun, Zhi Wang

Abstract: Heterogeneous treatment effect (HTE) estimation is vital for understanding the change of treatment effect across individuals or subgroups. Most existing HTE estimation methods focus on addressing selection bias induced by imbalanced distributions of confounders between treated and control units, but ignore distribution shifts across populations. Thereby, their applicability has been limited to the… ▽ More Heterogeneous treatment effect (HTE) estimation is vital for understanding the change of treatment effect across individuals or subgroups. Most existing HTE estimation methods focus on addressing selection bias induced by imbalanced distributions of confounders between treated and control units, but ignore distribution shifts across populations. Thereby, their applicability has been limited to the in-distribution (ID) population, which shares a similar distribution with the training dataset. In real-world applications, where population distributions are subject to continuous changes, there is an urgent need for stable HTE estimation across out-of-distribution (OOD) populations, which, however, remains an open problem. As pioneers in resolving this problem, we propose a novel Stable Balanced Representation Learning with Hierarchical-Attention Paradigm (SBRL-HAP) framework, which consists of 1) Balancing Regularizer for eliminating selection bias, 2) Independence Regularizer for addressing the distribution shift issue, 3) Hierarchical-Attention Paradigm for coordination between balance and independence. In this way, SBRL-HAP regresses counterfactual outcomes using ID data, while ensuring the resulting HTE estimation can be successfully generalized to out-of-distribution scenarios, thereby enhancing the model's applicability in real-world settings. Extensive experiments conducted on synthetic and real-world datasets demonstrate the effectiveness of our SBRL-HAP in achieving stable HTE estimation across OOD populations, with an average 10% reduction in the error metric PEHE and 11% decrease in the ATE bias, compared to the SOTA methods. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted by ICDE'2024

arXiv:2407.02539 [pdf]

Research on Autonomous Robots Navigation based on Reinforcement Learning

Authors: Zixiang Wang, Hao Yan, Yining Wang, Zhengjia Xu, Zhuoyue Wang, Zhizhong Wu

Abstract: Reinforcement learning continuously optimizes decision-making based on real-time feedback reward signals through continuous interaction with the environment, demonstrating strong adaptive and self-learning capabilities. In recent years, it has become one of the key methods to achieve autonomous navigation of robots. In this work, an autonomous robot navigation method based on reinforcement learnin… ▽ More Reinforcement learning continuously optimizes decision-making based on real-time feedback reward signals through continuous interaction with the environment, demonstrating strong adaptive and self-learning capabilities. In recent years, it has become one of the key methods to achieve autonomous navigation of robots. In this work, an autonomous robot navigation method based on reinforcement learning is introduced. We use the Deep Q Network (DQN) and Proximal Policy Optimization (PPO) models to optimize the path planning and decision-making process through the continuous interaction between the robot and the environment, and the reward signals with real-time feedback. By combining the Q-value function with the deep neural network, deep Q network can handle high-dimensional state space, so as to realize path planning in complex environments. Proximal policy optimization is a strategy gradient-based method, which enables robots to explore and utilize environmental information more efficiently by optimizing policy functions. These methods not only improve the robot's navigation ability in the unknown environment, but also enhance its adaptive and self-learning capabilities. Through multiple training and simulation experiments, we have verified the effectiveness and robustness of these models in various complex scenarios. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.02501 [pdf, other]

Data-driven Power Flow Linearization: Theory

Authors: Mengshuo Jia, Gabriela Hug, Ning Zhang, Zhaojian Wang, Yi Wang, Chongqing Kang

Abstract: This two-part tutorial dives into the field of data-driven power flow linearization (DPFL), a domain gaining increased attention. DPFL stands out for its higher approximation accuracy, wide adaptability, and better ability to implicitly incorporate the latest system attributes. This renders DPFL a potentially superior option for managing the significant fluctuations from renewable energy sources,… ▽ More This two-part tutorial dives into the field of data-driven power flow linearization (DPFL), a domain gaining increased attention. DPFL stands out for its higher approximation accuracy, wide adaptability, and better ability to implicitly incorporate the latest system attributes. This renders DPFL a potentially superior option for managing the significant fluctuations from renewable energy sources, a step towards realizing a more sustainable energy future, by translating the higher model accuracy into increased economic efficiency and less energy losses. To conduct a deep and rigorous reexamination, this tutorial first classifies existing DPFL methods into DPFL training algorithms and supportive techniques. Their mathematical models, analytical solutions, capabilities, limitations, and generalizability are systematically examined, discussed, and summarized. In addition, this tutorial reviews existing DPFL experiments, examining the settings of test systems, the fidelity of datasets, and the comparison made among a limited number of DPFL methods. Further, this tutorial implements extensive numerical comparisons of all existing DPFL methods (40 methods in total) and four classic physics-driven approaches, focusing on their generalizability, applicability, accuracy, and computational efficiency. Through these simulationmethodss, this tutorial aims to reveal the actual performance of all the methods (including the performances exposed to data noise or outliers), guiding the selection of appropriate linearization methods. Furthermore, this tutorial discusses future directions based on the theoretical and numerical insights gained. As the first part, this paper reexamines DPFL theories, covering all the training algorithms and supportive techniques. Capabilities, limitations, and aspects of generalizability, which were previously unmentioned in the literature, have been identified. △ Less

Submitted 10 June, 2024; originally announced July 2024.

Comments: 20 pages

arXiv:2406.15575 [pdf, ps, other]

Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity

Authors: Mucong Ding, Tahseen Rabbani, Bang An, Evan Z Wang, Furong Huang

Abstract: Graph Neural Networks (GNNs) are widely applied to graph learning problems such as node classification. When scaling up the underlying graphs of GNNs to a larger size, we are forced to either train on the complete graph and keep the full graph adjacency and node embeddings in memory (which is often infeasible) or mini-batch sample the graph (which results in exponentially growing computational com… ▽ More Graph Neural Networks (GNNs) are widely applied to graph learning problems such as node classification. When scaling up the underlying graphs of GNNs to a larger size, we are forced to either train on the complete graph and keep the full graph adjacency and node embeddings in memory (which is often infeasible) or mini-batch sample the graph (which results in exponentially growing computational complexities with respect to the number of GNN layers). Various sampling-based and historical-embedding-based methods are proposed to avoid this exponential growth of complexities. However, none of these solutions eliminates the linear dependence on graph size. This paper proposes a sketch-based algorithm whose training time and memory grow sublinearly with respect to graph size by training GNNs atop a few compact sketches of graph adjacency and node embeddings. Based on polynomial tensor-sketch (PTS) theory, our framework provides a novel protocol for sketching non-linear activations and graph convolution matrices in GNNs, as opposed to existing methods that sketch linear weights or gradients in neural networks. In addition, we develop a locality-sensitive hashing (LSH) technique that can be trained to improve the quality of sketches. Experiments on large-graph benchmarks demonstrate the scalability and competitive performance of our Sketch-GNNs versus their full-size GNN counterparts. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: NeurIPS 2022

arXiv:2406.12017 [pdf, other]

Sparsity-Constraint Optimization via Splicing Iteration

Authors: Zezhi Wang, ** Zhu, Junxian Zhu, Borui Tang, Hongmei Lin, Xueqin Wang

Abstract: Sparsity-constraint optimization has wide applicability in signal processing, statistics, and machine learning. Existing fast algorithms must burdensomely tune parameters, such as the step size or the implementation of precise stop criteria, which may be challenging to determine in practice. To address this issue, we develop an algorithm named Sparsity-Constraint Optimization via sPlicing itEratio… ▽ More Sparsity-constraint optimization has wide applicability in signal processing, statistics, and machine learning. Existing fast algorithms must burdensomely tune parameters, such as the step size or the implementation of precise stop criteria, which may be challenging to determine in practice. To address this issue, we develop an algorithm named Sparsity-Constraint Optimization via sPlicing itEration (SCOPE) to optimize nonlinear differential objective functions with strong convexity and smoothness in low dimensional subspaces. Algorithmically, the SCOPE algorithm converges effectively without tuning parameters. Theoretically, SCOPE has a linear convergence rate and converges to a solution that recovers the true support set when it correctly specifies the sparsity. We also develop parallel theoretical results without restricted-isometry-property-type conditions. We apply SCOPE's versatility and power to solve sparse quadratic optimization, learn sparse classifiers, and recover sparse Markov networks for binary variables. The numerical results on these specific tasks reveal that SCOPE perfectly identifies the true support set with a 10--1000 speedup over the standard exact solver, confirming SCOPE's algorithmic and theoretical merits. Our open-source Python package skscope based on C++ implementation is publicly available on GitHub, reaching a ten-fold speedup on the competing convex relaxation methods implemented by the cvxpy library. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 34 pages

arXiv:2406.09564 [pdf, other]

Towards Domain Adaptive Neural Contextual Bandits

Authors: Ziyan Wang, Hao Wang

Abstract: Contextual bandit algorithms are essential for solving real-world decision making problems. In practice, collecting a contextual bandit's feedback from different domains may involve different costs. For example, measuring drug reaction from mice (as a source domain) and humans (as a target domain). Unfortunately, adapting a contextual bandit algorithm from a source domain to a target domain with d… ▽ More Contextual bandit algorithms are essential for solving real-world decision making problems. In practice, collecting a contextual bandit's feedback from different domains may involve different costs. For example, measuring drug reaction from mice (as a source domain) and humans (as a target domain). Unfortunately, adapting a contextual bandit algorithm from a source domain to a target domain with distribution shift still remains a major challenge and largely unexplored. In this paper, we introduce the first general domain adaptation method for contextual bandits. Our approach learns a bandit model for the target domain by collecting feedback from the source domain. Our theoretical analysis shows that our algorithm maintains a sub-linear regret bound even adapting across domains. Empirical results show that our approach outperforms the state-of-the-art contextual bandit algorithms on real-world datasets. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.06893 [pdf, other]

Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot

Authors: Zixuan Wang, Stanley Wei, Daniel Hsu, Jason D. Lee

Abstract: The transformer architecture has prevailed in various deep learning settings due to its exceptional capabilities to select and compose structural information. Motivated by these capabilities, Sanford et al. proposed the sparse token selection task, in which transformers excel while fully-connected networks (FCNs) fail in the worst case. Building upon that, we strengthen the FCN lower bound to an a… ▽ More The transformer architecture has prevailed in various deep learning settings due to its exceptional capabilities to select and compose structural information. Motivated by these capabilities, Sanford et al. proposed the sparse token selection task, in which transformers excel while fully-connected networks (FCNs) fail in the worst case. Building upon that, we strengthen the FCN lower bound to an average-case setting and establish an algorithmic separation of transformers over FCNs. Specifically, a one-layer transformer trained with gradient descent provably learns the sparse token selection task and, surprisingly, exhibits strong out-of-distribution length generalization. We provide empirical simulations to justify our theoretical findings. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06833 [pdf, other]

Data-driven Power Flow Linearization: Simulation

Authors: Mengshuo Jia, Gabriela Hug, Ning Zhang, Zhaojian Wang, Yi Wang, Chongqing Kang

Abstract: Building on the theoretical insights of Part I, this paper, as the second part of the tutorial, dives deeper into data-driven power flow linearization (DPFL), focusing on comprehensive numerical testing. The necessity of these simulations stems from the theoretical analysis's inherent limitations, particularly the challenge of identifying the differences in real-world performance among DPFL method… ▽ More Building on the theoretical insights of Part I, this paper, as the second part of the tutorial, dives deeper into data-driven power flow linearization (DPFL), focusing on comprehensive numerical testing. The necessity of these simulations stems from the theoretical analysis's inherent limitations, particularly the challenge of identifying the differences in real-world performance among DPFL methods with overlap** theoretical capabilities and/or limitations. The absence of a comprehensive numerical comparison of DPFL approaches in the literature also motivates this paper, especially given the fact that over 95% of existing DPFL studies have not provided any open-source codes. To bridge the gap, this paper first reviews existing DPFL experiments, examining the adopted test scenarios, load fluctuation settings, data sources, considerations for data noise/outliers, and the comparison made so far. Subsequently, this paper evaluates a total of 44 methods, containing over 30 existing DPFL approaches, some innovative DPFL techniques, and several classic physics-driven power flow linearization methods for benchmarking. The evaluation spans various dimensions, including generalizability, applicability, accuracy, and computational efficiency, using various different test cases scaling from 9-bus to 1354-bus systems. The numerical analysis identifies and examines significant trends and consistent findings across all methods under various test cases. It also offers theoretical insights into phenomena like under-performance, failure, excessive computation times, etc. Overall, this paper identifies the differences in the performances of the wide range of DPFL methods, reveals gaps not evident from theoretical discussions, assists in method selection for real-world applications, and provides thorough discussions on open questions within DPFL research, indicating ten potential future directions. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 26 pages

arXiv:2406.05260 [pdf, other]

Generative modeling of density regression through tree flows

Authors: Zhuoqun Wang, Naoki Awaya, Li Ma

Abstract: A common objective in the analysis of tabular data is estimating the conditional distribution (in contrast to only producing predictions) of a set of "outcome" variables given a set of "covariates", which is sometimes referred to as the "density regression" problem. Beyond estimation on the conditional distribution, the generative ability of drawing synthetic samples from the learned conditional d… ▽ More A common objective in the analysis of tabular data is estimating the conditional distribution (in contrast to only producing predictions) of a set of "outcome" variables given a set of "covariates", which is sometimes referred to as the "density regression" problem. Beyond estimation on the conditional distribution, the generative ability of drawing synthetic samples from the learned conditional distribution is also desired as it further widens the range of applications. We propose a flow-based generative model tailored for the density regression task on tabular data. Our flow applies a sequence of tree-based piecewise-linear transforms on initial uniform noise to eventually generate samples from complex conditional densities of (univariate or multivariate) outcomes given the covariates and allows efficient analytical evaluation of the fitted conditional density on any point in the sample space. We introduce a training algorithm for fitting the tree-based transforms using a divide-and-conquer strategy that transforms maximum likelihood training of the tree-flow into training a collection of binary classifiers--one at each tree split--under cross-entropy loss. We assess the performance of our method under out-of-sample likelihood evaluation and compare it with a variety of state-of-the-art conditional density learners on a range of simulated and real benchmark tabular datasets. Our method consistently achieves comparable or superior performance at a fraction of the training and sampling budget. Finally, we demonstrate the utility of our method's generative ability through an application to generating synthetic longitudinal microbiome compositional data based on training our flow on a publicly available microbiome study. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 24 pages, 9 figures

arXiv:2406.05225 [pdf, other]

A Manifold Perspective on the Statistical Generalization of Graph Neural Networks

Authors: Zhiyang Wang, Juan Cervino, Alejandro Ribeiro

Abstract: Convolutional neural networks have been successfully extended to operate on graphs, giving rise to Graph Neural Networks (GNNs). GNNs combine information from adjacent nodes by successive applications of graph convolutions. GNNs have been implemented successfully in various learning tasks while the theoretical understanding of their generalization capability is still in progress. In this paper, we… ▽ More Convolutional neural networks have been successfully extended to operate on graphs, giving rise to Graph Neural Networks (GNNs). GNNs combine information from adjacent nodes by successive applications of graph convolutions. GNNs have been implemented successfully in various learning tasks while the theoretical understanding of their generalization capability is still in progress. In this paper, we leverage manifold theory to analyze the statistical generalization gap of GNNs operating on graphs constructed on sampled points from manifolds. We study the generalization gaps of GNNs on both node-level and graph-level tasks. We show that the generalization gaps decrease with the number of nodes in the training graphs, which guarantees the generalization of GNNs to unseen points over manifolds. We validate our theoretical results in multiple real-world datasets. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 34 pages,22 figures

arXiv:2406.05213 [pdf, other]

On Subjective Uncertainty Quantification and Calibration in Natural Language Generation

Authors: Ziyu Wang, Chris Holmes

Abstract: Applications of large language models often involve the generation of free-form responses, in which case uncertainty quantification becomes challenging. This is due to the need to identify task-specific uncertainties (e.g., about the semantics) which appears difficult to define in general cases. This work addresses these challenges from a perspective of Bayesian decision theory, starting from the… ▽ More Applications of large language models often involve the generation of free-form responses, in which case uncertainty quantification becomes challenging. This is due to the need to identify task-specific uncertainties (e.g., about the semantics) which appears difficult to define in general cases. This work addresses these challenges from a perspective of Bayesian decision theory, starting from the assumption that our utility is characterized by a similarity measure that compares a generated response with a hypothetical true response. We discuss how this assumption enables principled quantification of the model's subjective uncertainty and its calibration. We further derive a measure for epistemic uncertainty, based on a missing data perspective and its characterization as an excess risk. The proposed measures can be applied to black-box language models. We demonstrate the proposed methods on question answering and machine translation tasks, where they extract broadly meaningful uncertainty estimates from GPT and Gemini models and quantify their calibration. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.05193 [pdf, ps, other]

Probabilistic Clustering using Shared Latent Variable Model for Assessing Alzheimers Disease Biomarkers

Authors: Yizhen Xu, Scott Zeger, Zheyu Wang

Abstract: The preclinical stage of many neurodegenerative diseases can span decades before symptoms become apparent. Understanding the sequence of preclinical biomarker changes provides a critical opportunity for early diagnosis and effective intervention prior to significant loss of patients' brain functions. The main challenge to early detection lies in the absence of direct observation of the disease sta… ▽ More The preclinical stage of many neurodegenerative diseases can span decades before symptoms become apparent. Understanding the sequence of preclinical biomarker changes provides a critical opportunity for early diagnosis and effective intervention prior to significant loss of patients' brain functions. The main challenge to early detection lies in the absence of direct observation of the disease state and the considerable variability in both biomarkers and disease dynamics among individuals. Recent research hypothesized the existence of subgroups with distinct biomarker patterns due to co-morbidities and degrees of brain resilience. Our ability to early diagnose and intervene during the preclinical stage of neurodegenerative diseases will be enhanced by further insights into heterogeneity in the biomarker-disease relationship. In this paper, we focus on Alzheimer's disease (AD) and attempt to identify the systematic patterns within the heterogeneous AD biomarker-disease cascade. Specifically, we quantify the disease progression using a dynamic latent variable whose mixture distribution represents patient subgroups. Model estimation uses Hamiltonian Monte Carlo with the number of clusters determined by the Bayesian Information Criterion (BIC). We report simulation studies that investigate the performance of the proposed model in finite sample settings that are similar to our motivating application. We apply the proposed model to the BIOCARD data, a longitudinal study that was conducted over two decades among individuals who were initially cognitively normal. Our application yields evidence consistent with the hypothetical model of biomarker dynamics presented in Jack et al. (2013). In addition, our analysis identified two subgroups with distinct disease-onset patterns. Finally, we develop a dynamic prediction approach to improve the precision of prognoses. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.04575 [pdf, other]

Optimization of geological carbon storage operations with multimodal latent dynamic model and deep reinforcement learning

Authors: Zhongzheng Wang, Yuntian Chen, Guodong Chen, Dongxiao Zhang

Abstract: Maximizing storage performance in geological carbon storage (GCS) is crucial for commercial deployment, but traditional optimization demands resource-intensive simulations, posing computational challenges. This study introduces the multimodal latent dynamic (MLD) model, a deep learning framework for fast flow prediction and well control optimization in GCS. The MLD model includes a representation… ▽ More Maximizing storage performance in geological carbon storage (GCS) is crucial for commercial deployment, but traditional optimization demands resource-intensive simulations, posing computational challenges. This study introduces the multimodal latent dynamic (MLD) model, a deep learning framework for fast flow prediction and well control optimization in GCS. The MLD model includes a representation module for compressed latent representations, a transition module for system state evolution, and a prediction module for flow responses. A novel training strategy combining regression loss and joint-embedding consistency loss enhances temporal consistency and multi-step prediction accuracy. Unlike existing models, the MLD supports diverse input modalities, allowing comprehensive data interactions. The MLD model, resembling a Markov decision process (MDP), can train deep reinforcement learning agents, specifically using the soft actor-critic (SAC) algorithm, to maximize net present value (NPV) through continuous interactions. The approach outperforms traditional methods, achieving the highest NPV while reducing computational resources by over 60%. It also demonstrates strong generalization performance, providing improved decisions for new scenarios based on knowledge from previous ones. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.04329 [pdf, other]

Simplified and Generalized Masked Diffusion for Discrete Data

Authors: Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, Michalis K. Titsias

Abstract: Masked (or absorbing) diffusion is actively explored as an alternative to autoregressive models for generative modeling of discrete data. However, existing work in this area has been hindered by unnecessarily complex model formulations and unclear relationships between different perspectives, leading to suboptimal parameterization, training objectives, and ad hoc adjustments to counteract these is… ▽ More Masked (or absorbing) diffusion is actively explored as an alternative to autoregressive models for generative modeling of discrete data. However, existing work in this area has been hindered by unnecessarily complex model formulations and unclear relationships between different perspectives, leading to suboptimal parameterization, training objectives, and ad hoc adjustments to counteract these issues. In this work, we aim to provide a simple and general framework that unlocks the full potential of masked diffusion models. We show that the continuous-time variational objective of masked diffusion models is a simple weighted integral of cross-entropy losses. Our framework also enables training generalized masked diffusion models with state-dependent masking schedules. When evaluated by perplexity, our models trained on OpenWebText surpass prior diffusion language models at GPT-2 scale and demonstrate superior performance on 4 out of 5 zero-shot language modeling tasks. Furthermore, our models vastly outperform previous discrete diffusion models on pixel-level image modeling, achieving 2.78~(CIFAR-10) and 3.42 (ImageNet 64$\times$64) bits per dimension that are comparable or better than autoregressive models of similar sizes. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.01561 [pdf, other]

Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation

Authors: Mingyuan Zhou, Zhendong Wang, Huangjie Zheng, Hai Huang

Abstract: Diffusion-based text-to-image generation models trained on extensive text-image pairs have shown the capacity to generate photorealistic images consistent with textual descriptions. However, a significant limitation of these models is their slow sample generation, which requires iterative refinement through the same network. In this paper, we enhance Score identity Distillation (SiD) by develo**… ▽ More Diffusion-based text-to-image generation models trained on extensive text-image pairs have shown the capacity to generate photorealistic images consistent with textual descriptions. However, a significant limitation of these models is their slow sample generation, which requires iterative refinement through the same network. In this paper, we enhance Score identity Distillation (SiD) by develo** long and short classifier-free guidance (LSG) to efficiently distill pretrained Stable Diffusion models without using real training data. SiD aims to optimize a model-based explicit score matching loss, utilizing a score-identity-based approximation alongside the proposed LSG for practical computation. By training exclusively with fake images synthesized with its one-step generator, SiD equipped with LSG rapidly improves FID and CLIP scores, achieving state-of-the-art FID performance while maintaining a competitive CLIP score. Specifically, its data-free distillation of Stable Diffusion 1.5 achieves a record low FID of 8.15 on the COCO-2014 validation set, with a CLIP score of 0.304 at an LSG scale of 1.5, and a FID of 9.56 with a CLIP score of 0.313 at an LSG scale of 2. Our SiD-LSG code and distilled one-step text-to-image generators are available at https://github.com/mingyuanzhou/SiD-LSG △ Less

Submitted 22 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.00793 [pdf, other]

Is In-Context Learning in Large Language Models Bayesian? A Martingale Perspective

Authors: Fabian Falck, Ziyu Wang, Chris Holmes

Abstract: In-context learning (ICL) has emerged as a particularly remarkable characteristic of Large Language Models (LLM): given a pretrained LLM and an observed dataset, LLMs can make predictions for new data points from the same distribution without fine-tuning. Numerous works have postulated ICL as approximately Bayesian inference, rendering this a natural hypothesis. In this work, we analyse this hypot… ▽ More In-context learning (ICL) has emerged as a particularly remarkable characteristic of Large Language Models (LLM): given a pretrained LLM and an observed dataset, LLMs can make predictions for new data points from the same distribution without fine-tuning. Numerous works have postulated ICL as approximately Bayesian inference, rendering this a natural hypothesis. In this work, we analyse this hypothesis from a new angle through the martingale property, a fundamental requirement of a Bayesian learning system for exchangeable data. We show that the martingale property is a necessary condition for unambiguous predictions in such scenarios, and enables a principled, decomposed notion of uncertainty vital in trustworthy, safety-critical systems. We derive actionable checks with corresponding theory and test statistics which must hold if the martingale property is satisfied. We also examine if uncertainty in LLMs decreases as expected in Bayesian learning when more data is observed. In three experiments, we provide evidence for violations of the martingale property, and deviations from a Bayesian scaling behaviour of uncertainty, falsifying the hypothesis that ICL is Bayesian. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: Accepted at International Conference on Machine Learning (ICML) 2024

arXiv:2405.20763 [pdf, other]

Improving Generalization and Convergence by Enhancing Implicit Regularization

Authors: Mingze Wang, Haotian He, **bo Wang, Zilin Wang, Guanhua Huang, Feiyu Xiong, Zhiyu Li, Weinan E, Lei Wu

Abstract: In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that I… ▽ More In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that IRE can be practically incorporated with {\em generic base optimizers} without introducing significant computational overload. Experiments show that IRE consistently improves the generalization performance for image classification tasks across a variety of benchmark datasets (CIFAR-10/100, ImageNet) and models (ResNets and ViTs). Surprisingly, IRE also achieves a $2\times$ {\em speed-up} compared to AdamW in the pre-training of Llama models (of sizes ranging from 60M to 229M) on datasets including Wikitext-103, Minipile, and Openwebtext. Moreover, we provide theoretical guarantees, showing that IRE can substantially accelerate the convergence towards flat minima in Sharpness-aware Minimization (SAM). △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 35 pages

arXiv:2405.18459 [pdf, other]

Probing the Information Theoretical Roots of Spatial Dependence Measures

Authors: Zhangyu Wang, Krzysztof Janowicz, Gengchen Mai, Ivan Majic

Abstract: Intuitively, there is a relation between measures of spatial dependence and information theoretical measures of entropy. For instance, we can provide an intuition of why spatial data is special by stating that, on average, spatial data samples contain less than expected information. Similarly, spatial data, e.g., remotely sensed imagery, that is easy to compress is also likely to show significant… ▽ More Intuitively, there is a relation between measures of spatial dependence and information theoretical measures of entropy. For instance, we can provide an intuition of why spatial data is special by stating that, on average, spatial data samples contain less than expected information. Similarly, spatial data, e.g., remotely sensed imagery, that is easy to compress is also likely to show significant spatial autocorrelation. Formulating our (highly specific) core concepts of spatial information theory in the widely used language of information theory opens new perspectives on their differences and similarities and also fosters cross-disciplinary collaboration, e.g., with the broader AI/ML communities. Interestingly, however, this intuitive relation is challenging to formalize and generalize, leading prior work to rely mostly on experimental results, e.g., for describing landscape patterns. In this work, we will explore the information theoretical roots of spatial autocorrelation, more specifically Moran's I, through the lens of self-information (also known as surprisal) and provide both formal proofs and experiments. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: COSIT-2024 Conference Proceedings

arXiv:2405.18395 [pdf, other]

MC-GTA: Metric-Constrained Model-Based Clustering using Goodness-of-fit Tests with Autocorrelations

Authors: Zhangyu Wang, Gengchen Mai, Krzysztof Janowicz, Ni Lao

Abstract: A wide range of (multivariate) temporal (1D) and spatial (2D) data analysis tasks, such as grou** vehicle sensor trajectories, can be formulated as clustering with given metric constraints. Existing metric-constrained clustering algorithms overlook the rich correlation between feature similarity and metric distance, i.e., metric autocorrelation. The model-based variations of these clustering alg… ▽ More A wide range of (multivariate) temporal (1D) and spatial (2D) data analysis tasks, such as grou** vehicle sensor trajectories, can be formulated as clustering with given metric constraints. Existing metric-constrained clustering algorithms overlook the rich correlation between feature similarity and metric distance, i.e., metric autocorrelation. The model-based variations of these clustering algorithms (e.g. TICC and STICC) achieve SOTA performance, yet suffer from computational instability and complexity by using a metric-constrained Expectation-Maximization procedure. In order to address these two problems, we propose a novel clustering algorithm, MC-GTA (Model-based Clustering via Goodness-of-fit Tests with Autocorrelations). Its objective is only composed of pairwise weighted sums of feature similarity terms (square Wasserstein-2 distance) and metric autocorrelation terms (a novel multivariate generalization of classic semivariogram). We show that MC-GTA is effectively minimizing the total hinge loss for intra-cluster observation pairs not passing goodness-of-fit tests, i.e., statistically not originating from the same distribution. Experiments on 1D/2D synthetic and real-world datasets demonstrate that MC-GTA successfully incorporates metric autocorrelation. It outperforms strong baselines by large margins (up to 14.3% in ARI and 32.1% in NMI) with faster and stabler optimization (>10x speedup). △ Less

Submitted 2 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: ICML-2024 Proceedings

arXiv:2405.16436 [pdf, other]

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

Authors: Zhihan Liu, Miao Lu, Shenao Zhang, Boyi Liu, Hongyi Guo, Yingxiang Yang, Jose Blanchet, Zhaoran Wang

Abstract: Aligning generative models with human preference via RLHF typically suffers from overoptimization, where an imperfectly learned reward model can misguide the generative model to output undesired responses. We investigate this problem in a principled manner by identifying the source of the misalignment as a form of distributional shift and uncertainty in learning human preferences. To mitigate over… ▽ More Aligning generative models with human preference via RLHF typically suffers from overoptimization, where an imperfectly learned reward model can misguide the generative model to output undesired responses. We investigate this problem in a principled manner by identifying the source of the misalignment as a form of distributional shift and uncertainty in learning human preferences. To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model; one that simultaneously minimizes the maximum likelihood estimation of the loss and a reward penalty term. Here, the reward penalty term is introduced to prevent the policy from choosing actions with spurious high proxy rewards, resulting in provable sample efficiency of the algorithm under a partial coverage style condition. Moving from theory to practice, the proposed algorithm further enjoys an equivalent but surprisingly easy-to-implement reformulation. Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines: (i) a preference optimization loss that directly aligns the policy with human preference, and (ii) a supervised learning loss that explicitly imitates the policy with a (suitable) baseline distribution. In the context of aligning large language models (LLM), this objective fuses the direct preference optimization (DPO) loss with the supervised fune-tuning (SFT) loss to help mitigate the overoptimization towards undesired responses, for which we name the algorithm Regularized Preference Optimization (RPO). Experiments of aligning LLMs demonstrate the improved performance of RPO compared with DPO baselines. Our work sheds light on the interplay between preference optimization and SFT in tuning LLMs with both theoretical guarantees and empirical evidence. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: 27 pages, 7 figures

arXiv:2405.07761 [pdf, other]

LLM4ED: Large Language Models for Automatic Equation Discovery

Authors: Mengge Du, Yuntian Chen, Zhongzheng Wang, Longfeng Nie, Dongxiao Zhang

Abstract: Equation discovery is aimed at directly extracting physical laws from data and has emerged as a pivotal research domain. Previous methods based on symbolic mathematics have achieved substantial advancements, but often require the design of implementation of complex algorithms. In this paper, we introduce a new framework that utilizes natural language-based prompts to guide large language models (L… ▽ More Equation discovery is aimed at directly extracting physical laws from data and has emerged as a pivotal research domain. Previous methods based on symbolic mathematics have achieved substantial advancements, but often require the design of implementation of complex algorithms. In this paper, we introduce a new framework that utilizes natural language-based prompts to guide large language models (LLMs) in automatically mining governing equations from data. Specifically, we first utilize the generation capability of LLMs to generate diverse equations in string form, and then evaluate the generated equations based on observations. In the optimization phase, we propose two alternately iterated strategies to optimize generated equations collaboratively. The first strategy is to take LLMs as a black-box optimizer and achieve equation self-improvement based on historical samples and their performance. The second strategy is to instruct LLMs to perform evolutionary operators for global search. Experiments are extensively conducted on both partial differential equations and ordinary differential equations. Results demonstrate that our framework can discover effective equations to reveal the underlying physical laws under various nonlinear dynamic systems. Further comparisons are made with state-of-the-art models, demonstrating good stability and usability. Our framework substantially lowers the barriers to learning and applying equation discovery techniques, demonstrating the application potential of LLMs in the field of knowledge discovery. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.04393 [pdf, other]

Efficient Online Set-valued Classification with Bandit Feedback

Authors: Zhou Wang, Xingye Qiao

Abstract: Conformal prediction is a distribution-free method that wraps a given machine learning model and returns a set of plausible labels that contain the true label with a prescribed coverage rate. In practice, the empirical coverage achieved highly relies on fully observed label information from data both in the training phase for model fitting and the calibration phase for quantile estimation. This de… ▽ More Conformal prediction is a distribution-free method that wraps a given machine learning model and returns a set of plausible labels that contain the true label with a prescribed coverage rate. In practice, the empirical coverage achieved highly relies on fully observed label information from data both in the training phase for model fitting and the calibration phase for quantile estimation. This dependency poses a challenge in the context of online learning with bandit feedback, where a learner only has access to the correctness of actions (i.e., pulled an arm) but not the full information of the true label. In particular, when the pulled arm is incorrect, the learner only knows that the pulled one is not the true class label, but does not know which label is true. Additionally, bandit feedback further results in a smaller labeled dataset for calibration, limited to instances with correct actions, thereby affecting the accuracy of quantile estimation. To address these limitations, we propose Bandit Class-specific Conformal Prediction (BCCP), offering coverage guarantees on a class-specific granularity. Using an unbiased estimation of an estimand involving the true label, BCCP trains the model and makes set-valued inferences through stochastic gradient descent. Our approach overcomes the challenges of sparsely labeled data in each iteration and generalizes the reliability and applicability of conformal prediction to online decision-making environments. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.03329 [pdf, other]

Policy Learning for Balancing Short-Term and Long-Term Rewards

Authors: Peng Wu, Ziyu Shen, Feng Xie, Zhongyao Wang, Chunchen Liu, Yan Zeng

Abstract: Empirical researchers and decision-makers spanning various domains frequently seek profound insights into the long-term impacts of interventions. While the significance of long-term outcomes is undeniable, an overemphasis on them may inadvertently overshadow short-term gains. Motivated by this, this paper formalizes a new framework for learning the optimal policy that effectively balances both lon… ▽ More Empirical researchers and decision-makers spanning various domains frequently seek profound insights into the long-term impacts of interventions. While the significance of long-term outcomes is undeniable, an overemphasis on them may inadvertently overshadow short-term gains. Motivated by this, this paper formalizes a new framework for learning the optimal policy that effectively balances both long-term and short-term rewards, where some long-term outcomes are allowed to be missing. In particular, we first present the identifiability of both rewards under mild assumptions. Next, we deduce the semiparametric efficiency bounds, along with the consistency and asymptotic normality of their estimators. We also reveal that short-term outcomes, if associated, contribute to improving the estimator of the long-term reward. Based on the proposed estimators, we develop a principled policy learning approach and further derive the convergence rates of regret and estimation errors associated with the learned policy. Extensive experiments are conducted to validate the effectiveness of the proposed method, demonstrating its practical applicability. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.19292 [pdf, other]

Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning

Authors: Qiaosheng Zhang, Chenjia Bai, Shuyue Hu, Zhen Wang, Xuelong Li

Abstract: This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS). These algorithms draw inspiration from foundational concepts in information theory, and are proven to be sample efficient in MARL settings such as two-player zero-sum Markov games (MGs) and multi-player general-sum MGs. For episodic t… ▽ More This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS). These algorithms draw inspiration from foundational concepts in information theory, and are proven to be sample efficient in MARL settings such as two-player zero-sum Markov games (MGs) and multi-player general-sum MGs. For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium. The basic algorithm, referred to as MAIDS, employs an asymmetric learning structure where the max-player first solves a minimax optimization problem based on the joint information ratio of the joint policy, and the min-player then minimizes the marginal information ratio with the max-player's policy fixed. Theoretical analyses show that it achieves a Bayesian regret of tilde{O}(sqrt{K}) for K episodes. To reduce the computational load of MAIDS, we develop an improved algorithm called Reg-MAIDS, which has the same Bayesian regret bound while enjoying less computational complexity. Moreover, by leveraging the flexibility of IDS principle in choosing the learning target, we propose two methods for constructing compressed environments based on rate-distortion theory, upon which we develop an algorithm Compressed-MAIDS wherein the learning target is a compressed environment. Finally, we extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.12312 [pdf, ps, other]

A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimiax Optimization

Authors: Yuchen Zhu, Yufeng Zhang, Zhaoran Wang, Zhuoran Yang, Xiaohong Chen

Abstract: This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparameterized two-layer neural networks. In particular, we consider the minimax optimization problem stemming from estimating linear functional equations defined by conditional expectations, where the objective functions are quadratic in the functional spaces. We address (i) the convergence o… ▽ More This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparameterized two-layer neural networks. In particular, we consider the minimax optimization problem stemming from estimating linear functional equations defined by conditional expectations, where the objective functions are quadratic in the functional spaces. We address (i) the convergence of the stochastic gradient descent-ascent algorithm and (ii) the representation learning of the neural networks. We establish convergence under the mean-field regime by considering the continuous-time and infinite-width limit of the optimization dynamics. Under this regime, the stochastic gradient descent-ascent corresponds to a Wasserstein gradient flow over the space of probability measures defined over the space of neural network parameters. We prove that the Wasserstein gradient flow converges globally to a stationary point of the minimax objective at a $O(T^{-1} + α^{-1})$ sublinear rate, and additionally finds the solution to the functional equation when the regularizer of the minimax objective is strongly convex. Here $T$ denotes the time and $α$ is a scaling parameter of the neural networks. In terms of representation learning, our results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(α^{-1})$, measured in terms of the Wasserstein distance. Finally, we apply our general results to concrete examples including policy evaluation, nonparametric instrumental variable regression, asset pricing, and adversarial Riesz representer estimation. △ Less

Submitted 25 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

Comments: Submitted

arXiv:2404.08667 [pdf, other]

Traffic State Estimation and Uncertainty Quantification at Signalized Intersections with Low Penetration Rate Vehicle Trajectory Data

Authors: Xingmin Wang, Zihao Wang, Zachary Jerome, Henry X. Liu

Abstract: This paper studies the traffic state estimation problem at signalized intersections with low penetration rate vehicle trajectory data. While many existing studies have proposed different methods to estimate unknown traffic states and parameters (e.g., penetration rate, queue length) with this data, most of them only provide a point estimation without knowing the uncertainty of these estimated valu… ▽ More This paper studies the traffic state estimation problem at signalized intersections with low penetration rate vehicle trajectory data. While many existing studies have proposed different methods to estimate unknown traffic states and parameters (e.g., penetration rate, queue length) with this data, most of them only provide a point estimation without knowing the uncertainty of these estimated values. It is important to quantify the estimation uncertainty caused by limited available data since it can explicitly inform us whether the available data is sufficient to satisfy the desired estimation accuracy. To fill this gap, we formulate the partially observable system as a hidden Markov model (HMM) based on the recently developed probabilistic time-space (PTS) model. The PTS model is a stochastic traffic flow model that is designed for modeling traffic flow dynamics near signalized intersections. Based on the HMM formulation, a single recursive program is developed for the Bayesian estimation of both traffic states and parameters. As a Bayesian approach, the proposed method provides the distributional estimation outcomes and directly quantifies the estimation uncertainty. We validate the proposed method with simulation studies and showcase its applicability to real-world vehicle trajectory data. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.07323 [pdf, other]

Surrogate modeling for probability distribution estimation:uniform or adaptive design?

Authors: Maijia Su, Ziqi Wang, Oreste Salvatore Bursi, Marco Broccardo

Abstract: The active learning (AL) technique, one of the state-of-the-art methods for constructing surrogate models, has shown high accuracy and efficiency in forward uncertainty quantification (UQ) analysis. This paper provides a comprehensive study on AL-based global surrogates for computing the full distribution function, i.e., the cumulative distribution function (CDF) and the complementary CDF (CCDF).… ▽ More The active learning (AL) technique, one of the state-of-the-art methods for constructing surrogate models, has shown high accuracy and efficiency in forward uncertainty quantification (UQ) analysis. This paper provides a comprehensive study on AL-based global surrogates for computing the full distribution function, i.e., the cumulative distribution function (CDF) and the complementary CDF (CCDF). To this end, we investigate the three essential components for building surrogates, i.e., types of surrogate models, enrichment methods for experimental designs, and stop** criteria. For each component, we choose several representative methods and study their desirable configurations. In addition, we devise a uniform design (i.e., space-filling design) as a baseline for measuring the improvement of using AL. Combining all the representative methods, a total of 1,920 UQ analyses are carried out to solve 16 benchmark examples. The performance of the selected strategies is evaluated based on accuracy and efficiency. In the context of full distribution estimation, this study concludes that (i) AL techniques cannot provide a systematic improvement compared with uniform designs, (ii) the recommended surrogate modeling methods depend on the features of the problems (especially the local nonlinearity), target accuracy, and computational budget. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.06984 [pdf, other]

Adaptive Strategy of Testing Alphas in High Dimensional Linear Factor Pricing Models

Authors: Chenxi Zhao, ** Zhao, Long Feng, Zhaojun Wang

Abstract: In recent years, there has been considerable research on testing alphas in high-dimensional linear factor pricing models. In our study, we introduce a novel max-type test procedure that performs well under sparse alternatives. Furthermore, we demonstrate that this new max-type test procedure is asymptotically independent from the sum-type test procedure proposed by Pesaran and Yamagata (2017). Bui… ▽ More In recent years, there has been considerable research on testing alphas in high-dimensional linear factor pricing models. In our study, we introduce a novel max-type test procedure that performs well under sparse alternatives. Furthermore, we demonstrate that this new max-type test procedure is asymptotically independent from the sum-type test procedure proposed by Pesaran and Yamagata (2017). Building on this, we propose a Fisher combination test procedure that exhibits good performance for both dense and sparse alternatives. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.04057 [pdf, other]

Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation

Authors: Mingyuan Zhou, Huangjie Zheng, Zhendong Wang, Mingzhang Yin, Hai Huang

Abstract: We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fréchet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models. By refo… ▽ More We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fréchet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models. By reformulating forward diffusion processes as semi-implicit distributions, we leverage three score-related identities to create an innovative loss mechanism. This mechanism achieves rapid FID reduction by training the generator using its own synthesized images, eliminating the need for real data or reverse-diffusion-based generation, all accomplished within significantly shortened generation time. Upon evaluation across four benchmark datasets, the SiD algorithm demonstrates high iteration efficiency during distillation and surpasses competing distillation approaches, whether they are one-step or few-step, data-free, or dependent on training data, in terms of generation quality. This achievement not only redefines the benchmarks for efficiency and effectiveness in diffusion distillation but also in the broader field of diffusion-based generation. The PyTorch implementation is available at https://github.com/mingyuanzhou/SiD △ Less

Submitted 24 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

Comments: ICML 2024, PyTorch implementation: https://github.com/mingyuanzhou/SiD

arXiv:2403.19629 [pdf, other]

Metric Learning from Limited Pairwise Preference Comparisons

Authors: Zhi Wang, Geelon So, Ramya Korlakai Vinayak

Abstract: We study metric learning from preference comparisons under the ideal point model, in which a user prefers an item over another if it is closer to their latent ideal item. These items are embedded into $\mathbb{R}^d$ equipped with an unknown Mahalanobis distance shared across users. While recent work shows that it is possible to simultaneously recover the metric and ideal items given… ▽ More We study metric learning from preference comparisons under the ideal point model, in which a user prefers an item over another if it is closer to their latent ideal item. These items are embedded into $\mathbb{R}^d$ equipped with an unknown Mahalanobis distance shared across users. While recent work shows that it is possible to simultaneously recover the metric and ideal items given $\mathcal{O}(d)$ pairwise comparisons per user, in practice we often have a limited budget of $o(d)$ comparisons. We study whether the metric can still be recovered, even though it is known that learning individual ideal items is now no longer possible. We show that in general, $o(d)$ comparisons reveals no information about the metric, even with infinitely many users. However, when comparisons are made over items that exhibit low-dimensional structure, each user can contribute to learning the metric restricted to a low-dimensional subspace so that the metric can be jointly identified. We present a divide-and-conquer approach that achieves this, and provide theoretical recovery guarantees and empirical validation. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19381 [pdf, other]

On Uncertainty Quantification for Near-Bayes Optimal Algorithms

Authors: Ziyu Wang, Chris Holmes

Abstract: Bayesian modelling allows for the quantification of predictive uncertainty which is crucial in safety-critical applications. Yet for many machine learning (ML) algorithms, it is difficult to construct or implement their Bayesian counterpart. In this work we present a promising approach to address this challenge, based on the hypothesis that commonly used ML algorithms are efficient across a wide v… ▽ More Bayesian modelling allows for the quantification of predictive uncertainty which is crucial in safety-critical applications. Yet for many machine learning (ML) algorithms, it is difficult to construct or implement their Bayesian counterpart. In this work we present a promising approach to address this challenge, based on the hypothesis that commonly used ML algorithms are efficient across a wide variety of tasks and may thus be near Bayes-optimal w.r.t. an unknown task distribution. We prove that it is possible to recover the Bayesian posterior defined by the task distribution, which is unknown but optimal in this setting, by building a martingale posterior using the algorithm. We further propose a practical uncertainty quantification method that apply to general ML algorithms. Experiments based on a variety of non-NN and NN algorithms demonstrate the efficacy of our method. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.18540 [pdf, other]

skscope: Fast Sparsity-Constrained Optimization in Python

Authors: Zezhi Wang, ** Zhu, Peng Chen, Huiyang Peng, Xiaoke Zhang, Anran Wang, Yu Zheng, Junxian Zhu, Xueqin Wang

Abstract: Applying iterative solvers on sparsity-constrained optimization (SCO) requires tedious mathematical deduction and careful programming/debugging that hinders these solvers' broad impact. In the paper, the library skscope is introduced to overcome such an obstacle. With skscope, users can solve the SCO by just programming the objective function. The convenience of skscope is demonstrated through two… ▽ More Applying iterative solvers on sparsity-constrained optimization (SCO) requires tedious mathematical deduction and careful programming/debugging that hinders these solvers' broad impact. In the paper, the library skscope is introduced to overcome such an obstacle. With skscope, users can solve the SCO by just programming the objective function. The convenience of skscope is demonstrated through two examples in the paper, where sparse linear regression and trend filtering are addressed with just four lines of code. More importantly, skscope's efficient implementation allows state-of-the-art solvers to quickly attain the sparse solution regardless of the high dimensionality of parameter space. Numerical experiments reveal the available solvers in skscope can achieve up to 80x speedup on the competing relaxation solutions obtained via the benchmarked convex solver. skscope is published on the Python Package Index (PyPI) and Conda, and its source code is available at: https://github.com/abess-team/skscope. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 4 pages

arXiv:2403.16825 [pdf, ps, other]

Weak Convergence Analysis of Online Neural Actor-Critic Algorithms

Authors: Samuel Chun-Hei Lam, Justin Sirignano, Ziheng Wang

Abstract: We prove that a single-layer neural network trained with the online actor critic algorithm converges in distribution to a random ordinary differential equation (ODE) as the number of hidden units and the number of training steps $\rightarrow \infty$. In the online actor-critic algorithm, the distribution of the data samples dynamically changes as the model is updated, which is a key challenge for… ▽ More We prove that a single-layer neural network trained with the online actor critic algorithm converges in distribution to a random ordinary differential equation (ODE) as the number of hidden units and the number of training steps $\rightarrow \infty$. In the online actor-critic algorithm, the distribution of the data samples dynamically changes as the model is updated, which is a key challenge for any convergence analysis. We establish the geometric ergodicity of the data samples under a fixed actor policy. Then, using a Poisson equation, we prove that the fluctuations of the model updates around the limit distribution due to the randomly-arriving data samples vanish as the number of parameter updates $\rightarrow \infty$. Using the Poisson equation and weak convergence techniques, we prove that the actor neural network and critic neural network converge to the solutions of a system of ODEs with random initial conditions. Analysis of the limit ODE shows that the limit critic network will converge to the true value function, which will provide the actor an asymptotically unbiased estimate of the policy gradient. We then prove that the limit actor network will converge to a stationary point. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.14830 [pdf, other]

Deep Clustering Evaluation: How to Validate Internal Clustering Validation Measures

Authors: Zeya Wang, Chenglong Ye

Abstract: Deep clustering, a method for partitioning complex, high-dimensional data using deep neural networks, presents unique evaluation challenges. Traditional clustering validation measures, designed for low-dimensional spaces, are problematic for deep clustering, which involves projecting data into lower-dimensional embeddings before partitioning. Two key issues are identified: 1) the curse of dimensio… ▽ More Deep clustering, a method for partitioning complex, high-dimensional data using deep neural networks, presents unique evaluation challenges. Traditional clustering validation measures, designed for low-dimensional spaces, are problematic for deep clustering, which involves projecting data into lower-dimensional embeddings before partitioning. Two key issues are identified: 1) the curse of dimensionality when applying these measures to raw data, and 2) the unreliable comparison of clustering results across different embedding spaces stemming from variations in training procedures and parameter settings in different clustering models. This paper addresses these challenges in evaluating clustering quality in deep learning. We present a theoretical framework to highlight ineffectiveness arising from using internal validation measures on raw and embedded data and propose a systematic approach to applying clustering validity indices in deep clustering contexts. Experiments show that this framework aligns better with external validation measures, effectively reducing the misguidance from the improper use of clustering validity indices in deep learning. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.13196 [pdf, other]

ADAPT to Robustify Prompt Tuning Vision Transformers

Authors: Masih Eskandar, Tooba Imtiaz, Zifeng Wang, Jennifer Dy

Abstract: The performance of deep models, including Vision Transformers, is known to be vulnerable to adversarial attacks. Many existing defenses against these attacks, such as adversarial training, rely on full-model fine-tuning to induce robustness in the models. These defenses require storing a copy of the entire model, that can have billions of parameters, for each task. At the same time, parameter-effi… ▽ More The performance of deep models, including Vision Transformers, is known to be vulnerable to adversarial attacks. Many existing defenses against these attacks, such as adversarial training, rely on full-model fine-tuning to induce robustness in the models. These defenses require storing a copy of the entire model, that can have billions of parameters, for each task. At the same time, parameter-efficient prompt tuning is used to adapt large transformer-based models to downstream tasks without the need to save large copies. In this paper, we examine parameter-efficient prompt tuning of Vision Transformers for downstream tasks under the lens of robustness. We show that previous adversarial defense methods, when applied to the prompt tuning paradigm, suffer from gradient obfuscation and are vulnerable to adaptive attacks. We introduce ADAPT, a novel framework for performing adaptive adversarial training in the prompt tuning paradigm. Our method achieves competitive robust accuracy of ~40% w.r.t. SOTA robustness methods using full-model fine-tuning, by tuning only ~1% of the number of parameters. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.13081 [pdf, other]

Parameter Estimation from Single Patient, Single Time-Point Sequencing Data of Recurrent Tumors

Authors: Kevin Leder, Ru** Sun, Zicheng Wang, Xuanming Zhang

Abstract: In this study, we develop consistent estimators for key parameters that govern the dynamics of tumor cell populations when subjected to pharmacological treatments. While these treatments often lead to an initial reduction in the abundance of drug-sensitive cells, a population of drug-resistant cells frequently emerges over time, resulting in cancer recurrence. Samples from recurrent tumors present… ▽ More In this study, we develop consistent estimators for key parameters that govern the dynamics of tumor cell populations when subjected to pharmacological treatments. While these treatments often lead to an initial reduction in the abundance of drug-sensitive cells, a population of drug-resistant cells frequently emerges over time, resulting in cancer recurrence. Samples from recurrent tumors present as an invaluable data source that can offer crucial insights into the ability of cancer cells to adapt and withstand treatment interventions. To effectively utilize the data obtained from recurrent tumors, we derive several large number limit theorems, specifically focusing on the metrics that quantify the clonal diversity of cancer cell populations at the time of cancer recurrence. These theorems then serve as the foundation for constructing our estimators. A distinguishing feature of our approach is that our estimators only require a single time-point sequencing data from a single tumor, thereby enhancing the practicality of our approach and enabling the understanding of cancer recurrence at the individual level. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.11429 [pdf, other]

Long-range Ising model for regional-scale seismic risk analysis

Authors: Sebin Oh, Sang-ri Yi, Ziqi Wang

Abstract: This study introduces the long-range Ising model from statistical mechanics to the Performance-Based Earthquake Engineering (PBEE) framework for regional seismic damage analysis. The application of the PBEE framework at a regional scale involves estimating the damage states of numerous structures, typically performed using fragility function-based stochastic simulations. However, these simulations… ▽ More This study introduces the long-range Ising model from statistical mechanics to the Performance-Based Earthquake Engineering (PBEE) framework for regional seismic damage analysis. The application of the PBEE framework at a regional scale involves estimating the damage states of numerous structures, typically performed using fragility function-based stochastic simulations. However, these simulations often assume conditional independence or employ simplistic dependency models among the damage states of structures, leading to significant misrepresentation of regional risk. The Ising model addresses this issue by converting the available information on binary damage states (safe or failure) into a joint probability mass function, leveraging the principle of maximum entropy. The Ising model offers two main benefits: (1) it requires only the first- and second-order cross-moments, enabling seamless integration with the existing PBEE framework, and (2) it provides meaningful physical interpretations of the model parameters, facilitating the uncovering of insights not apparent from data. To demonstrate the proposed method, we applied the Ising model to 156 buildings in Antakya, Turkey, using post-hazard damage evaluation data, and to 182 buildings in Pacific Heights, San Francisco, using simulated data from the Regional Resilience Determination (R2D) tool. In both instances, the Ising model accurately reproduces the provided information and generates meaningful insights into regional damage. The study also investigates the change in Ising model parameters under varying earthquake magnitudes, along with the mean-field approximation, further facilitating the applicability of the proposed approach. △ Less

Submitted 23 May, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.00283 [pdf, other]

Risk Twin: Real-time Risk Visualization and Control for Structural Systems

Authors: Zeyu Wang, Ziqi Wang

Abstract: Digital twinning in structural engineering is a rapidly evolving technology that aims to eliminate the gap between physical systems and their digital models through real-time sensing, visualization, and control techniques. Although digital twins can offer dynamic insights into physical systems, their accuracy is inevitably compromised by uncertainties in sensing, modeling, simulation, and controll… ▽ More Digital twinning in structural engineering is a rapidly evolving technology that aims to eliminate the gap between physical systems and their digital models through real-time sensing, visualization, and control techniques. Although digital twins can offer dynamic insights into physical systems, their accuracy is inevitably compromised by uncertainties in sensing, modeling, simulation, and controlling. This paper proposes a specialized digital twin formulation, named Risk Twin, designed for real-time risk visualization and risk-informed control of structural systems. Integrating structural reliability and Bayesian inference methods with digital twining techniques, Risk Twin can analyze and visualize the reliability indices for structural components in real-time. To facilitate real-time inference and reliability updating, a "simulation-free" scheme is proposed, leveraging precomputed quantities prepared during an offline phase for rapid inference in the online phase. Proof-of-concept numerical and real-world Risk Twins are constructed to showcase the proposed concepts. △ Less

Submitted 29 February, 2024; originally announced March 2024.

arXiv:2402.10810 [pdf, ps, other]

Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning

Authors: Zihao Li, Boyi Liu, Zhuoran Yang, Zhaoran Wang, Mengdi Wang

Abstract: We study the Constrained Convex Markov Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure, subject to a convex constraint. Designing algorithms for a constrained convex MDP faces several challenges, including (1) handling the large state space, (2) managing the exploration/exploitation tradeoff, and (3) solving the constrained optimization where the… ▽ More We study the Constrained Convex Markov Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure, subject to a convex constraint. Designing algorithms for a constrained convex MDP faces several challenges, including (1) handling the large state space, (2) managing the exploration/exploitation tradeoff, and (3) solving the constrained optimization where the objective and the constraint are both nonlinear functions of the visitation measure. In this work, we present a model-based algorithm, Variational Primal-Dual Policy Optimization (VPDPO), in which Lagrangian and Fenchel duality are implemented to reformulate the original constrained problem into an unconstrained primal-dual optimization. Moreover, the primal variables are updated by model-based value iteration following the principle of Optimism in the Face of Uncertainty (OFU), while the dual variables are updated by gradient ascent. Moreover, by embedding the visitation measure into a finite-dimensional space, we can handle large state spaces by incorporating function approximation. Two notable examples are (1) Kernelized Nonlinear Regulators and (2) Low-rank MDPs. We prove that with an optimistic planning oracle, our algorithm achieves sublinear regret and constraint violation in both cases and can attain the globally optimal policy of the original constrained problem. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.10127 [pdf, other]

Nonlinear spiked covariance matrices and signal propagation in deep neural networks

Authors: Zhichao Wang, Denny Wu, Zhou Fan

Abstract: Many recent works have studied the eigenvalue spectrum of the Conjugate Kernel (CK) defined by the nonlinear feature map of a feedforward neural network. However, existing results only establish weak convergence of the empirical eigenvalue distribution, and fall short of providing precise quantitative characterizations of the ''spike'' eigenvalues and eigenvectors that often capture the low-dimens… ▽ More Many recent works have studied the eigenvalue spectrum of the Conjugate Kernel (CK) defined by the nonlinear feature map of a feedforward neural network. However, existing results only establish weak convergence of the empirical eigenvalue distribution, and fall short of providing precise quantitative characterizations of the ''spike'' eigenvalues and eigenvectors that often capture the low-dimensional signal structure of the learning problem. In this work, we characterize these signal eigenvalues and eigenvectors for a nonlinear version of the spiked covariance model, including the CK as a special case. Using this general result, we give a quantitative description of how spiked eigenstructure in the input data propagates through the hidden layers of a neural network with random weights. As a second application, we study a simple regime of representation learning where the weight matrix develops a rank-one signal component over training and characterize the alignment of the target function with the spike eigenvector of the CK on test data. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 55 pages

arXiv:2402.08539 [pdf]

Intelligent Diagnosis of Alzheimer's Disease Based on Machine Learning

Authors: Mingyang Li, Hongyu Liu, Yixuan Li, Zejun Wang, Yuan Yuan, Honglin Dai

Abstract: This study is based on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and aims to explore early detection and disease progression in Alzheimer's disease (AD). We employ innovative data preprocessing strategies, including the use of the random forest algorithm to fill missing data and the handling of outliers and invalid data, thereby fully mining and utilizing these limited data re… ▽ More This study is based on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and aims to explore early detection and disease progression in Alzheimer's disease (AD). We employ innovative data preprocessing strategies, including the use of the random forest algorithm to fill missing data and the handling of outliers and invalid data, thereby fully mining and utilizing these limited data resources. Through Spearman correlation coefficient analysis, we identify some features strongly correlated with AD diagnosis. We build and test three machine learning models using these features: random forest, XGBoost, and support vector machine (SVM). Among them, the XGBoost model performs the best in terms of diagnostic performance, achieving an accuracy of 91%. Overall, this study successfully overcomes the challenge of missing data and provides valuable insights into early detection of Alzheimer's disease, demonstrating its unique research value and practical significance. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.07227 [pdf, other]

Time-Delayed Game Strategy Analysis Among Japan, Other Nations, and the International Atomic Energy Agency in the Context of Fukushima Nuclear Wastewater Discharge Decision

Authors: Mingyang Li, Han Pengsihua, Fujiao Meng, Zejun Wang, Weian Liu

Abstract: This academic paper examines the strategic interactions between Japan, other nations, and the International Atomic Energy Agency (IAEA) regarding Japan's decision to release treated nuclear wastewater from the Fukushima Daiichi Nuclear Power Plant into the sea. It introduces a payoff matrix and time-delay elements in replicator dynamic equations to mirror real-world decision-making delays. The pap… ▽ More This academic paper examines the strategic interactions between Japan, other nations, and the International Atomic Energy Agency (IAEA) regarding Japan's decision to release treated nuclear wastewater from the Fukushima Daiichi Nuclear Power Plant into the sea. It introduces a payoff matrix and time-delay elements in replicator dynamic equations to mirror real-world decision-making delays. The paper analyzes the stability of strategies and conditions for different stable states using characteristic roots of a linearized system and numerical simulations. It concludes that time delays significantly affect decision-making stability and evolution trajectories in nuclear wastewater disposal strategies. The study highlights the importance of efficient wastewater treatment technology, the impact of export tax revenue losses on Japan's strategies, and the role of international cooperation. The novelty of the research lies in integrating time-delay elements from ocean dynamics and governmental decision-making into the game-theoretical model. △ Less

Submitted 11 February, 2024; originally announced February 2024.

arXiv:2402.07210 [pdf, other]

Fukushima Nuclear Wastewater Discharge: An Evolutionary Game Theory Approach to International and Domestic Interaction and Strategic Decision-Making

Authors: Mingyang Li, Han Pengsihua, Songqing Zhao, Zejun Wang, Limin Yang, Weian Liu

Abstract: On August 24, 2023, Japan controversially decided to discharge nuclear wastewater from the Fukushima Daiichi Nuclear Power Plant into the ocean, sparking intense domestic and global debates. This study uses evolutionary game theory to analyze the strategic dynamics between Japan, other countries, and the Japan Fisheries Association. By incorporating economic, legal, international aid, and environm… ▽ More On August 24, 2023, Japan controversially decided to discharge nuclear wastewater from the Fukushima Daiichi Nuclear Power Plant into the ocean, sparking intense domestic and global debates. This study uses evolutionary game theory to analyze the strategic dynamics between Japan, other countries, and the Japan Fisheries Association. By incorporating economic, legal, international aid, and environmental factors, the research identifies three evolutionarily stable strategies, analyzing them via numerical simulations. The focus is on Japan's shift from wastewater release to its cessation, exploring the myriad factors influencing this transition and their effects on stakeholders' decisions. Key insights highlight the need for international cooperation, rigorous scientific research, public education, and effective wastewater treatment methods. Offering both a fresh theoretical perspective and practical guidance, this study aims to foster global consensus on nuclear wastewater management, crucial for marine conservation and sustainable development. △ Less

Submitted 11 February, 2024; originally announced February 2024.

arXiv:2402.04582 [pdf, other]

Dimensionality reduction can be used as a surrogate model for high-dimensional forward uncertainty quantification

Authors: Jungho Kim, Sang-ri Yi, Ziqi Wang

Abstract: We introduce a method to construct a stochastic surrogate model from the results of dimensionality reduction in forward uncertainty quantification. The hypothesis is that the high-dimensional input augmented by the output of a computational model admits a low-dimensional representation. This assumption can be met by numerous uncertainty quantification applications with physics-based computational… ▽ More We introduce a method to construct a stochastic surrogate model from the results of dimensionality reduction in forward uncertainty quantification. The hypothesis is that the high-dimensional input augmented by the output of a computational model admits a low-dimensional representation. This assumption can be met by numerous uncertainty quantification applications with physics-based computational models. The proposed approach differs from a sequential application of dimensionality reduction followed by surrogate modeling, as we "extract" a surrogate model from the results of dimensionality reduction in the input-output space. This feature becomes desirable when the input space is genuinely high-dimensional. The proposed method also diverges from the Probabilistic Learning on Manifold, as a reconstruction map** from the feature space to the input-output space is circumvented. The final product of the proposed method is a stochastic simulator that propagates a deterministic input into a stochastic output, preserving the convenience of a sequential "dimensionality reduction + Gaussian process regression" approach while overcoming some of its limitations. The proposed method is demonstrated through two uncertainty quantification problems characterized by high-dimensional input uncertainties. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.03954 [pdf, other]

Mixed Matrix Completion in Complex Survey Sampling under Heterogeneous Missingness

Authors: Xiaojun Mao, Hengfang Wang, Zhonglei Wang, Shu Yang

Abstract: Modern surveys with large sample sizes and growing mixed-type questionnaires require robust and scalable analysis methods. In this work, we consider recovering a mixed dataframe matrix, obtained by complex survey sampling, with entries following different canonical exponential distributions and subject to heterogeneous missingness. To tackle this challenging task, we propose a two-stage procedure:… ▽ More Modern surveys with large sample sizes and growing mixed-type questionnaires require robust and scalable analysis methods. In this work, we consider recovering a mixed dataframe matrix, obtained by complex survey sampling, with entries following different canonical exponential distributions and subject to heterogeneous missingness. To tackle this challenging task, we propose a two-stage procedure: in the first stage, we model the entry-wise missing mechanism by logistic regression, and in the second stage, we complete the target parameter matrix by maximizing a weighted log-likelihood with a low-rank constraint. We propose a fast and scalable estimation algorithm that achieves sublinear convergence, and the upper bound for the estimation error of the proposed method is rigorously derived. Experimental results support our theoretical claims, and the proposed estimator shows its merits compared to other existing methods. The proposed method is applied to analyze the National Health and Nutrition Examination Survey data. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: Journal of Computational and Graphical Statistics, 2023

arXiv:2402.01887 [pdf, other]

On f-Divergence Principled Domain Adaptation: An Improved Framework

Authors: Ziqiao Wang, Yongyi Mao

Abstract: Unsupervised domain adaptation (UDA) plays a crucial role in addressing distribution shifts in machine learning. In this work, we improve the theoretical foundations of UDA proposed by Acuna et al. (2021) by refining their f-divergence-based discrepancy and additionally introducing a new measure, f-domain discrepancy (f-DD). By removing the absolute value function and incorporating a scaling param… ▽ More Unsupervised domain adaptation (UDA) plays a crucial role in addressing distribution shifts in machine learning. In this work, we improve the theoretical foundations of UDA proposed by Acuna et al. (2021) by refining their f-divergence-based discrepancy and additionally introducing a new measure, f-domain discrepancy (f-DD). By removing the absolute value function and incorporating a scaling parameter, f-DD yields novel target error and sample complexity bounds, allowing us to recover previous KL-based results and bridging the gap between algorithms and theory presented in Acuna et al. (2021). Leveraging a localization technique, we also develop a fast-rate generalization bound. Empirical results demonstrate the superior performance of f-DD-based domain learning algorithms over previous works in popular UDA benchmarks. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2402.01381 [pdf, other]

Spatial-Sign based Maxsum Test for High Dimensional Location Parameters

Authors: Jixuan Liu, Long Feng, Zhaojun Wang

Abstract: In this study, we explore a robust testing procedure for the high-dimensional location parameters testing problem. Initially, we introduce a spatial-sign based max-type test statistic, which exhibits excellent performance for sparse alternatives. Subsequently, we demonstrate the asymptotic independence between this max-type test statistic and the spatial-sign based sum-type test statistic (Feng an… ▽ More In this study, we explore a robust testing procedure for the high-dimensional location parameters testing problem. Initially, we introduce a spatial-sign based max-type test statistic, which exhibits excellent performance for sparse alternatives. Subsequently, we demonstrate the asymptotic independence between this max-type test statistic and the spatial-sign based sum-type test statistic (Feng and Sun, 2016). Building on this, we propose a spatial-sign based max-sum type testing procedure, which shows remarkable performance under varying signal sparsity. Our simulation studies underscore the superior performance of the procedures we propose. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.17585 [pdf, other]

Propagation and Pitfalls: Reasoning-based Assessment of Knowledge Editing through Counterfactual Tasks

Authors: Wenyue Hua, Jiang Guo, Mingwen Dong, Henghui Zhu, Patrick Ng, Zhiguo Wang

Abstract: Current approaches of knowledge editing struggle to effectively propagate updates to interconnected facts. In this work, we delve into the barriers that hinder the appropriate propagation of updated knowledge within these models for accurate reasoning. To support our analysis, we introduce a novel reasoning-based benchmark -- ReCoE (Reasoning-based Counterfactual Editing dataset) -- which covers s… ▽ More Current approaches of knowledge editing struggle to effectively propagate updates to interconnected facts. In this work, we delve into the barriers that hinder the appropriate propagation of updated knowledge within these models for accurate reasoning. To support our analysis, we introduce a novel reasoning-based benchmark -- ReCoE (Reasoning-based Counterfactual Editing dataset) -- which covers six common reasoning schemes in real world. We conduct a thorough analysis of existing knowledge editing techniques, including input augmentation, finetuning, and locate-and-edit. We found that all model editing methods show notably low performance on this dataset, especially in certain reasoning schemes. Our analysis over the chain-of-thought generation of edited models further uncover key reasons behind the inadequacy of existing knowledge editing methods from a reasoning standpoint, involving aspects on fact-wise editing, fact recall ability, and coherence in generation. We will make our benchmark publicly available. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: 22 pages, 14 figures, 5 tables

arXiv:2401.14657 [pdf, other]

Validating Climate Models with Spherical Convolutional Wasserstein Distance

Authors: Robert C. Garrett, Trevor Harris, Bo Li, Zhuo Wang

Abstract: The validation of global climate models is crucial to ensure the accuracy and efficacy of model output. We introduce the spherical convolutional Wasserstein distance to more comprehensively measure differences between climate models and reanalysis data. This new similarity measure accounts for spatial variability using convolutional projections and quantifies local differences in the distribution… ▽ More The validation of global climate models is crucial to ensure the accuracy and efficacy of model output. We introduce the spherical convolutional Wasserstein distance to more comprehensively measure differences between climate models and reanalysis data. This new similarity measure accounts for spatial variability using convolutional projections and quantifies local differences in the distribution of climate variables. We apply this method to evaluate the historical model outputs of the Coupled Model Intercomparison Project (CMIP) members by comparing them to observational and reanalysis data products. Additionally, we investigate the progression from CMIP phase 5 to phase 6 and find modest improvements in the phase 6 models regarding their ability to produce realistic climatologies. △ Less

Submitted 26 January, 2024; originally announced January 2024.

arXiv:2401.14052 [pdf, ps, other]

Testing Alpha in High Dimensional Linear Factor Pricing Models with Dependent Observations

Authors: Huifang Ma, Long Feng, Zhaojun Wang, Jigang Bao

Abstract: In this study, we introduce three distinct testing methods for testing alpha in high dimensional linear factor pricing model that deals with dependent data. The first method is a sum-type test procedure, which exhibits high performance when dealing with dense alternatives. The second method is a max-type test procedure, which is particularly effective for sparse alternatives. For a broader range o… ▽ More In this study, we introduce three distinct testing methods for testing alpha in high dimensional linear factor pricing model that deals with dependent data. The first method is a sum-type test procedure, which exhibits high performance when dealing with dense alternatives. The second method is a max-type test procedure, which is particularly effective for sparse alternatives. For a broader range of alternatives, we suggest a Cauchy combination test procedure. This is predicated on the asymptotic independence of the sum-type and max-type test statistics. Both simulation studies and practical data application demonstrate the effectiveness of our proposed methods when handling dependent observations. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Showing 1–50 of 708 results for author: Wang, Z