Search | arXiv e-print repository

Fusion of Movement and Naive Predictions for Point Forecasting in Univariate Random Walks

Abstract: Traditional methods for point forecasting in univariate random walks often fail to surpass naive benchmarks due to data unpredictability. This study introduces a novel forecasting method that fuses movement prediction (binary classification) with naive forecasts for accurate one-step-ahead point forecasting. The method's efficacy is demonstrated through theoretical analysis, simulations, and real-… ▽ More Traditional methods for point forecasting in univariate random walks often fail to surpass naive benchmarks due to data unpredictability. This study introduces a novel forecasting method that fuses movement prediction (binary classification) with naive forecasts for accurate one-step-ahead point forecasting. The method's efficacy is demonstrated through theoretical analysis, simulations, and real-world data experiments. It reliably exceeds naive forecasts with movement prediction accuracies as low as 0.55, outperforming baseline models like ARIMA, linear regression, MLP, and LSTM networks in forecasting the S\&P 500 index and Bitcoin prices. This method is particularly advantageous when accurate point predictions are challenging but accurate movement predictions are attainable, translating movement predictions into point forecasts in random walk contexts. △ Less

Submitted 24 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.10650 [pdf, other]

The Implicit Bias of Adam on Separable Data

Authors: Chenyang Zhang, Difan Zou, Yuan Cao

Abstract: Adam has become one of the most favored optimizers in deep learning problems. Despite its success in practice, numerous mysteries persist regarding its theoretical understanding. In this paper, we study the implicit bias of Adam in linear logistic regression. Specifically, we show that when the training data are linearly separable, Adam converges towards a linear classifier that achieves the maxim… ▽ More Adam has become one of the most favored optimizers in deep learning problems. Despite its success in practice, numerous mysteries persist regarding its theoretical understanding. In this paper, we study the implicit bias of Adam in linear logistic regression. Specifically, we show that when the training data are linearly separable, Adam converges towards a linear classifier that achieves the maximum $\ell_\infty$-margin. Notably, for a general class of diminishing learning rates, this convergence occurs within polynomial time. Our result shed light on the difference between Adam and (stochastic) gradient descent from a theoretical perspective. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 33 pages, 2 figures

arXiv:2405.19752 [pdf, ps, other]

Understanding Memory-Regret Trade-Off for Streaming Stochastic Multi-Armed Bandits

Authors: Yuchen He, Zichun Ye, Chihao Zhang

Abstract: We study the stochastic multi-armed bandit problem in the $P$-pass streaming model. In this problem, the $n$ arms are present in a stream and at most $m<n$ arms and their statistics can be stored in the memory. We give a complete characterization of the optimal regret in terms of $m, n$ and $P$. Specifically, we design an algorithm with… ▽ More We study the stochastic multi-armed bandit problem in the $P$-pass streaming model. In this problem, the $n$ arms are present in a stream and at most $m<n$ arms and their statistics can be stored in the memory. We give a complete characterization of the optimal regret in terms of $m, n$ and $P$. Specifically, we design an algorithm with $\tilde O\left((n-m)^{1+\frac{2^{P}-2}{2^{P+1}-1}} n^{\frac{2-2^{P+1}}{2^{P+1}-1}} T^{\frac{2^P}{2^{P+1}-1}}\right)$ regret and complement it with an $\tilde Ω\left((n-m)^{1+\frac{2^{P}-2}{2^{P+1}-1}} n^{\frac{2-2^{P+1}}{2^{P+1}-1}} T^{\frac{2^P}{2^{P+1}-1}}\right)$ lower bound when the number of rounds $T$ is sufficiently large. Our results are tight up to a logarithmic factor in $n$ and $P$. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.18997 [pdf, other]

Kernel Semi-Implicit Variational Inference

Authors: Ziheng Cheng, Longlin Yu, Tianyu Xie, Shiyue Zhang, Cheng Zhang

Abstract: Semi-implicit variational inference (SIVI) extends traditional variational families with semi-implicit distributions defined in a hierarchical manner. Due to the intractable densities of semi-implicit distributions, classical SIVI often resorts to surrogates of evidence lower bound (ELBO) that would introduce biases for training. A recent advancement in SIVI, named SIVI-SM, utilizes an alternative… ▽ More Semi-implicit variational inference (SIVI) extends traditional variational families with semi-implicit distributions defined in a hierarchical manner. Due to the intractable densities of semi-implicit distributions, classical SIVI often resorts to surrogates of evidence lower bound (ELBO) that would introduce biases for training. A recent advancement in SIVI, named SIVI-SM, utilizes an alternative score matching objective made tractable via a minimax formulation, albeit requiring an additional lower-level optimization. In this paper, we propose kernel SIVI (KSIVI), a variant of SIVI-SM that eliminates the need for lower-level optimization through kernel tricks. Specifically, we show that when optimizing over a reproducing kernel Hilbert space (RKHS), the lower-level problem has an explicit solution. This way, the upper-level objective becomes the kernel Stein discrepancy (KSD), which is readily computable for stochastic gradient descent due to the hierarchical structure of semi-implicit variational distributions. An upper bound for the variance of the Monte Carlo gradient estimators of the KSD objective is derived, which allows us to establish novel convergence guarantees of KSIVI. We demonstrate the effectiveness and efficiency of KSIVI on both synthetic distributions and a variety of real data Bayesian inference tasks. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: ICML 2024 camera ready

arXiv:2405.18836 [pdf, other]

Do Finetti: On Causal Effects for Exchangeable Data

Authors: Siyuan Guo, Chi Zhang, Karthika Mohan, Ferenc Huszár, Bernhard Schölkopf

Abstract: We study causal effect estimation in a setting where the data are not i.i.d. (independent and identically distributed). We focus on exchangeable data satisfying an assumption of independent causal mechanisms. Traditional causal effect estimation frameworks, e.g., relying on structural causal models and do-calculus, are typically limited to i.i.d. data and do not extend to more general exchangeable… ▽ More We study causal effect estimation in a setting where the data are not i.i.d. (independent and identically distributed). We focus on exchangeable data satisfying an assumption of independent causal mechanisms. Traditional causal effect estimation frameworks, e.g., relying on structural causal models and do-calculus, are typically limited to i.i.d. data and do not extend to more general exchangeable generative processes, which naturally arise in multi-environment data. To address this gap, we develop a generalized framework for exchangeable data and introduce a truncated factorization formula that facilitates both the identification and estimation of causal effects in our setting. To illustrate potential applications, we introduce a causal Pólya urn model and demonstrate how intervention propagates effects in exchangeable data settings. Finally, we develop an algorithm that performs simultaneous causal discovery and effect estimation given multi-environment data. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.16577 [pdf, other]

Reflected Flow Matching

Authors: Tianyu Xie, Yu Zhu, Longlin Yu, Tong Yang, Ziheng Cheng, Shiyue Zhang, Xiangyu Zhang, Cheng Zhang

Abstract: Continuous normalizing flows (CNFs) learn an ordinary differential equation to transform prior samples into data. Flow matching (FM) has recently emerged as a simulation-free approach for training CNFs by regressing a velocity model towards the conditional velocity field. However, on constrained domains, the learned velocity model may lead to undesirable flows that result in highly unnatural sampl… ▽ More Continuous normalizing flows (CNFs) learn an ordinary differential equation to transform prior samples into data. Flow matching (FM) has recently emerged as a simulation-free approach for training CNFs by regressing a velocity model towards the conditional velocity field. However, on constrained domains, the learned velocity model may lead to undesirable flows that result in highly unnatural samples, e.g., oversaturated images, due to both flow matching error and simulation error. To address this, we add a boundary constraint term to CNFs, which leads to reflected CNFs that keep trajectories within the constrained domains. We propose reflected flow matching (RFM) to train the velocity model in reflected CNFs by matching the conditional velocity fields in a simulation-free manner, similar to the vanilla FM. Moreover, the analytical form of conditional velocity fields in RFM avoids potentially biased approximations, making it superior to existing score-based generative models on constrained domains. We demonstrate that RFM achieves comparable or better results on standard image benchmarks and produces high-quality class-conditioned samples under high guidance weight. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: ICML 2024 camera-ready

arXiv:2405.12838 [pdf, ps, other]

Quantum Non-Identical Mean Estimation: Efficient Algorithms and Fundamental Limits

Authors: Jiachen Hu, Tongyang Li, Xinzhao Wang, Yecheng Xue, Chenyi Zhang, Han Zhong

Abstract: We systematically investigate quantum algorithms and lower bounds for mean estimation given query access to non-identically distributed samples. On the one hand, we give quantum mean estimators with quadratic quantum speed-up given samples from different bounded or sub-Gaussian random variables. On the other hand, we prove that, in general, it is impossible for any quantum algorithm to achieve qua… ▽ More We systematically investigate quantum algorithms and lower bounds for mean estimation given query access to non-identically distributed samples. On the one hand, we give quantum mean estimators with quadratic quantum speed-up given samples from different bounded or sub-Gaussian random variables. On the other hand, we prove that, in general, it is impossible for any quantum algorithm to achieve quadratic speed-up over the number of classical samples needed to estimate the mean $μ$, where the samples come from different random variables with mean close to $μ$. Technically, our quantum algorithms reduce bounded and sub-Gaussian random variables to the Bernoulli case, and use an uncomputation trick to overcome the challenge that direct amplitude estimation does not work with non-identical query access. Our quantum query lower bounds are established by simulating non-identical oracles by parallel oracles, and also by an adversarial method with non-identical oracles. Both results pave the way for proving quantum query lower bounds with non-identical oracles in general, which may be of independent interest. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 31 pages, 0 figure. To appear in the 19th Theory of Quantum Computation, Communication and Cryptography (TQC 2024)

arXiv:2405.08005 [pdf, other]

Graphon Mean Field Games with a Representative Player: Analysis and Learning Algorithm

Authors: Fuzhong Zhou, Chenyu Zhang, Xu Chen, Xuan Di

Abstract: We propose a discrete time graphon game formulation on continuous state and action spaces using a representative player to study stochastic games with heterogeneous interaction among agents. This formulation admits both philosophical and mathematical advantages, compared to a widely adopted formulation using a continuum of players. We prove the existence and uniqueness of the graphon equilibrium w… ▽ More We propose a discrete time graphon game formulation on continuous state and action spaces using a representative player to study stochastic games with heterogeneous interaction among agents. This formulation admits both philosophical and mathematical advantages, compared to a widely adopted formulation using a continuum of players. We prove the existence and uniqueness of the graphon equilibrium with mild assumptions, and show that this equilibrium can be used to construct an approximate solution for finite player game on networks, which is challenging to analyze and solve due to curse of dimensionality. An online oracle-free learning algorithm is developed to solve the equilibrium numerically, and sample complexity analysis is provided for its convergence. △ Less

Submitted 4 June, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

Comments: Published as a conference paper at ICML 2024

arXiv:2404.16023 [pdf, other]

Learning Car-Following Behaviors Using Bayesian Matrix Normal Mixture Regression

Authors: Chengyuan Zhang, Kehua Chen, Meixin Zhu, Hai Yang, Lijun Sun

Abstract: Learning and understanding car-following (CF) behaviors are crucial for microscopic traffic simulation. Traditional CF models, though simple, often lack generalization capabilities, while many data-driven methods, despite their robustness, operate as "black boxes" with limited interpretability. To bridge this gap, this work introduces a Bayesian Matrix Normal Mixture Regression (MNMR) model that s… ▽ More Learning and understanding car-following (CF) behaviors are crucial for microscopic traffic simulation. Traditional CF models, though simple, often lack generalization capabilities, while many data-driven methods, despite their robustness, operate as "black boxes" with limited interpretability. To bridge this gap, this work introduces a Bayesian Matrix Normal Mixture Regression (MNMR) model that simultaneously captures feature correlations and temporal dynamics inherent in CF behaviors. This approach is distinguished by its separate learning of row and column covariance matrices within the model framework, offering an insightful perspective into the human driver decision-making processes. Through extensive experiments, we assess the model's performance across various historical steps of inputs, predictive steps of outputs, and model complexities. The results consistently demonstrate our model's adeptness in effectively capturing the intricate correlations and temporal dynamics present during CF. A focused case study further illustrates the model's outperforming interpretability of identifying distinct operational conditions through the learned mean and covariance matrices. This not only underlines our model's effectiveness in understanding complex human driving behaviors in CF scenarios but also highlights its potential as a tool for enhancing the interpretability of CF behaviors in traffic simulations and autonomous driving systems. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 6 pages, Accepted by the 35th IEEE Intelligent Vehicles Symposium

arXiv:2404.13836 [pdf, other]

MultiFun-DAG: Multivariate Functional Directed Acyclic Graph

Authors: Tian Lan, Ziyue Li, Junpeng Lin, Zhishuai Li, Lei Bai, Man Li, Fugee Tsung, Rui Zhao, Chen Zhang

Abstract: Directed Acyclic Graphical (DAG) models efficiently formulate causal relationships in complex systems. Traditional DAGs assume nodes to be scalar variables, characterizing complex systems under a facile and oversimplified form. This paper considers that nodes can be multivariate functional data and thus proposes a multivariate functional DAG (MultiFun-DAG). It constructs a hidden bilinear multivar… ▽ More Directed Acyclic Graphical (DAG) models efficiently formulate causal relationships in complex systems. Traditional DAGs assume nodes to be scalar variables, characterizing complex systems under a facile and oversimplified form. This paper considers that nodes can be multivariate functional data and thus proposes a multivariate functional DAG (MultiFun-DAG). It constructs a hidden bilinear multivariate function-to-function regression to describe the causal relationships between different nodes. Then an Expectation-Maximum algorithm is used to learn the graph structure as a score-based algorithm with acyclic constraints. Theoretical properties are diligently derived. Prudent numerical studies and a case study from urban traffic congestion analysis are conducted to show MultiFun-DAG's effectiveness. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.13302 [pdf, other]

Monte Carlo sampling with integrator snippets

Authors: Christophe Andrieu, Mauro Camara Escudero, Chang Zhang

Abstract: Assume interest is in sampling from a probability distribution $μ$ defined on $(\mathsf{Z},\mathscr{Z})$. We develop a framework to construct sampling algorithms taking full advantage of numerical integrators of ODEs, say $ψ\colon\mathsf{Z}\rightarrow\mathsf{Z}$ for one integration step, to explore $μ$ efficiently and robustly. The popular Hybrid/Hamiltonian Monte Carlo (HMC) algorithm [Duane, 198… ▽ More Assume interest is in sampling from a probability distribution $μ$ defined on $(\mathsf{Z},\mathscr{Z})$. We develop a framework to construct sampling algorithms taking full advantage of numerical integrators of ODEs, say $ψ\colon\mathsf{Z}\rightarrow\mathsf{Z}$ for one integration step, to explore $μ$ efficiently and robustly. The popular Hybrid/Hamiltonian Monte Carlo (HMC) algorithm [Duane, 1987], [Neal, 2011] and its derivatives are example of such a use of numerical integrators. However we show how the potential of integrators can be exploited beyond current ideas and HMC sampling in order to take into account aspects of the geometry of the target distribution. A key idea is the notion of integrator snippet, a fragment of the orbit of an ODE numerical integrator $ψ$, and its associate probability distribution $\barμ$, which takes the form of a mixture of distributions derived from $μ$ and $ψ$. Exploiting properties of mixtures we show how samples from $\barμ$ can be used to estimate expectations with respect to $μ$. We focus here primarily on Sequential Monte Carlo (SMC) algorithms, but the approach can be used in the context of Markov chain Monte Carlo algorithms as discussed at the end of the manuscript. We illustrate performance of these new algorithms through numerical experimentation and provide preliminary theoretical results supporting observed performance. △ Less

Submitted 20 April, 2024; originally announced April 2024.

MSC Class: 65C05; 65C35 ACM Class: I.6.8; G.3

arXiv:2404.10942 [pdf, other]

What Hides behind Unfairness? Exploring Dynamics Fairness in Reinforcement Learning

Authors: Zhihong Deng, **g Jiang, Guodong Long, Chengqi Zhang

Abstract: In sequential decision-making problems involving sensitive attributes like race and gender, reinforcement learning (RL) agents must carefully consider long-term fairness while maximizing returns. Recent works have proposed many different types of fairness notions, but how unfairness arises in RL problems remains unclear. In this paper, we address this gap in the literature by investigating the sou… ▽ More In sequential decision-making problems involving sensitive attributes like race and gender, reinforcement learning (RL) agents must carefully consider long-term fairness while maximizing returns. Recent works have proposed many different types of fairness notions, but how unfairness arises in RL problems remains unclear. In this paper, we address this gap in the literature by investigating the sources of inequality through a causal lens. We first analyse the causal relationships governing the data generation process and decompose the effect of sensitive attributes on long-term well-being into distinct components. We then introduce a novel notion called dynamics fairness, which explicitly captures the inequality stemming from environmental dynamics, distinguishing it from those induced by decision-making or inherited from the past. This notion requires evaluating the expected changes in the next state and the reward induced by changing the value of the sensitive attribute while holding everything else constant. To quantitatively evaluate this counterfactual concept, we derive identification formulas that allow us to obtain reliable estimations from data. Extensive experiments demonstrate the effectiveness of the proposed techniques in explaining, detecting, and reducing inequality in reinforcement learning. We publicly release code at https://github.com/familyld/InsightFair. △ Less

Submitted 28 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: 13 pages, 9 figures, accepted by IJCAI 2024

arXiv:2404.08169 [pdf, other]

AutoGFI: Streamlined Generalized Fiducial Inference for Modern Inference Problems

Authors: Wei Du, Jan Hannig, Thomas C. M. Lee, Yi Su, Chunzhe Zhang

Abstract: The origins of fiducial inference trace back to the 1930s when R. A. Fisher first introduced the concept as a response to what he perceived as a limitation of Bayesian inference - the requirement for a subjective prior distribution on model parameters in cases where no prior information was available. However, Fisher's initial fiducial approach fell out of favor as complications arose, particularl… ▽ More The origins of fiducial inference trace back to the 1930s when R. A. Fisher first introduced the concept as a response to what he perceived as a limitation of Bayesian inference - the requirement for a subjective prior distribution on model parameters in cases where no prior information was available. However, Fisher's initial fiducial approach fell out of favor as complications arose, particularly in multi-parameter problems. In the wake of 2000, amidst a renewed interest in contemporary adaptations of fiducial inference, generalized fiducial inference (GFI) emerged to extend Fisher's fiducial argument, providing a promising avenue for addressing numerous crucial and practical inference challenges. Nevertheless, the adoption of GFI has been limited due to its often demanding mathematical derivations and the necessity for implementing complex Markov Chain Monte Carlo algorithms. This complexity has impeded its widespread utilization and practical applicability. This paper presents a significant advancement by introducing an innovative variant of GFI designed to alleviate these challenges. Specifically, this paper proposes AutoGFI, an easily implementable algorithm that streamlines the application of GFI to a broad spectrum of inference problems involving additive noise. AutoGFI can be readily implemented as long as a fitting routine is available, making it accessible to a broader audience of researchers and practitioners. To demonstrate its effectiveness, AutoGFI is applied to three contemporary and challenging problems: tensor regression, matrix completion, and regression with network cohesion. These case studies highlight the immense potential of GFI and illustrate AutoGFI's promising performance when compared to specialized solutions for these problems. Overall, this research paves the way for a more accessible and powerful application of GFI in a range of practical domains. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.06969 [pdf, other]

FiP: a Fixed-Point Approach for Causal Generative Modeling

Authors: Meyer Scetbon, Joel Jennings, Agrin Hilmkil, Cheng Zhang, Chao Ma

Abstract: Modeling true world data-generating processes lies at the heart of empirical science. Structural Causal Models (SCMs) and their associated Directed Acyclic Graphs (DAGs) provide an increasingly popular answer to such problems by defining the causal generative process that transforms random noise into observations. However, learning them from observational data poses an ill-posed and NP-hard invers… ▽ More Modeling true world data-generating processes lies at the heart of empirical science. Structural Causal Models (SCMs) and their associated Directed Acyclic Graphs (DAGs) provide an increasingly popular answer to such problems by defining the causal generative process that transforms random noise into observations. However, learning them from observational data poses an ill-posed and NP-hard inverse problem in general. In this work, we propose a new and equivalent formalism that does not require DAGs to describe them, viewed as fixed-point problems on the causally ordered variables, and we show three important cases where they can be uniquely recovered given the topological ordering (TO). To the best of our knowledge, we obtain the weakest conditions for their recovery when TO is known. Based on this, we design a two-stage causal generative model that first infers the causal order from observations in a zero-shot manner, thus by-passing the search, and then learns the generative fixed-point SCM on the ordered variables. To infer TOs from observations, we propose to amortize the learning of TOs on generated datasets by sequentially predicting the leaves of graphs seen during training. To learn fixed-point SCMs, we design a transformer-based architecture that exploits a new attention mechanism enabling the modeling of causal structures, and show that this parameterization is consistent with our formalism. Finally, we conduct an extensive evaluation of each method individually, and show that when combined, our model outperforms various baselines on generated out-of-distribution problems. △ Less

Submitted 14 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.04403 [pdf, other]

Low-Rank Robust Subspace Tensor Clustering for Metro Passenger Flow Modeling

Authors: Jiuyun Hu, Ziyue Li, Chen Zhang, Fugee Tsung, Hao Yan

Abstract: Tensor clustering has become an important topic, specifically in spatio-temporal modeling, due to its ability to cluster spatial modes (e.g., stations or road segments) and temporal modes (e.g., time of the day or day of the week). Our motivating example is from subway passenger flow modeling, where similarities between stations are commonly found. However, the challenges lie in the innate high-di… ▽ More Tensor clustering has become an important topic, specifically in spatio-temporal modeling, due to its ability to cluster spatial modes (e.g., stations or road segments) and temporal modes (e.g., time of the day or day of the week). Our motivating example is from subway passenger flow modeling, where similarities between stations are commonly found. However, the challenges lie in the innate high-dimensionality of tensors and also the potential existence of anomalies. This is because the three tasks, i.e., dimension reduction, clustering, and anomaly decomposition, are inter-correlated to each other, and treating them in a separate manner will render a suboptimal performance. Thus, in this work, we design a tensor-based subspace clustering and anomaly decomposition technique for simultaneously outlier-robust dimension reduction and clustering for high-dimensional tensors. To achieve this, a novel low-rank robust subspace clustering decomposition model is proposed by combining Tucker decomposition, sparse anomaly decomposition, and subspace clustering. An effective algorithm based on Block Coordinate Descent is proposed to update the parameters. Prudent experiments prove the effectiveness of the proposed framework via the simulation study, with a gain of +25% clustering accuracy than benchmark methods in a hard case. The interrelations of the three tasks are also analyzed via ablation studies, validating the interrelation assumption. Moreover, a case study in the station clustering based on real passenger flow data is conducted, with quite valuable insights discovered. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: Conditionally Accepted in INFORMS Journal of Data Science

arXiv:2404.03329 [pdf]

DeepFunction: Deep Metric Learning-based Imbalanced Classification for Diagnosing Threaded Pipe Connection Defects using Functional Data

Authors: Yukun Xie, Juan Du, Chen Zhang

Abstract: In modern manufacturing, most of the product lines are conforming. Few products are nonconforming but with different defect types. The identification of defect types can help further root cause diagnosis of production lines. With the sensing development, signals of process variables can be collected in high resolution, which can be regarded as multichannel functional data. They have abundant infor… ▽ More In modern manufacturing, most of the product lines are conforming. Few products are nonconforming but with different defect types. The identification of defect types can help further root cause diagnosis of production lines. With the sensing development, signals of process variables can be collected in high resolution, which can be regarded as multichannel functional data. They have abundant information to characterize the process and help identify the defect types. Motivated by a real example from the pipe tightening process, we focus on defect classification where each sample is a multichannel functional data. However, the available samples for each defect type are limited and imbalanced. Moreover, the functions are incomplete since the pre-tightening process before the pipe tightening process is unobserved. To classify the defect samples based on imbalanced, multichannel, and incomplete functional data is very important but challenging. Thus, we propose an innovative classification framework based on deep metric learning using functional data (DeepFunction). The framework leverages the power of deep metric learning to train on imbalanced datasets. A neural network specially crafted for processing functional data is also proposed to handle multichannel and incomplete functional data. The results from a real-world case study demonstrate the superior accuracy of our framework when compared to existing benchmarks. △ Less

Submitted 24 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: Revised version for submission to IISE Transactions

arXiv:2404.00220 [pdf, other]

Partially-Observable Sequential Change-Point Detection for Autocorrelated Data via Upper Confidence Region

Authors: Haijie Xu, Xiaochen Xian, Chen Zhang, Kaibo Liu

Abstract: Sequential change point detection for multivariate autocorrelated data is a very common problem in practice. However, when the sensing resources are limited, only a subset of variables from the multivariate system can be observed at each sensing time point. This raises the problem of partially observable multi-sensor sequential change point detection. For it, we propose a detection scheme called a… ▽ More Sequential change point detection for multivariate autocorrelated data is a very common problem in practice. However, when the sensing resources are limited, only a subset of variables from the multivariate system can be observed at each sensing time point. This raises the problem of partially observable multi-sensor sequential change point detection. For it, we propose a detection scheme called adaptive upper confidence region with state space model (AUCRSS). It models multivariate time series via a state space model (SSM), and uses an adaptive sampling policy for efficient change point detection and localization. A partially-observable Kalman filter algorithm is developed for online inference of SSM, and accordingly, a change point detection scheme based on a generalized likelihood ratio test is developed. How its detection power relates to the adaptive sampling strategy is analyzed. Meanwhile, by treating the detection power as a reward, its connection with the online combinatorial multi-armed bandit (CMAB) problem is formulated and an adaptive upper confidence region algorithm is proposed for adaptive sampling policy design. Theoretical analysis of the asymptotic average detection delay is performed, and thorough numerical studies with synthetic data and real-world data are conducted to demonstrate the effectiveness of our method. △ Less

Submitted 29 March, 2024; originally announced April 2024.

arXiv:2404.00218 [pdf, other]

Functional-Edged Network Modeling

Authors: Haijie Xu, Chen Zhang

Abstract: Contrasts with existing works which all consider nodes as functions and use edges to represent the relationships between different functions. We target at network modeling whose edges are functional data and transform the adjacency matrix into a functional adjacency tensor, introducing an additional dimension dedicated to function representation. Tucker functional decomposition is used for the fun… ▽ More Contrasts with existing works which all consider nodes as functions and use edges to represent the relationships between different functions. We target at network modeling whose edges are functional data and transform the adjacency matrix into a functional adjacency tensor, introducing an additional dimension dedicated to function representation. Tucker functional decomposition is used for the functional adjacency tensor, and to further consider the community between nodes, we regularize the basis matrices to be symmetrical. Furthermore, to deal with irregular observations of the functional edges, we conduct model inference to solve a tensor completion problem. It is optimized by a Riemann conjugate gradient descent method. Besides these, we also derive several theorems to show the desirable properties of the functional edged network model. Finally, we evaluate the efficacy of our proposed model using simulation data and real metro system data from Hong Kong and Singapore. △ Less

Submitted 29 March, 2024; originally announced April 2024.

arXiv:2403.19851 [pdf, other]

Localizing Paragraph Memorization in Language Models

Authors: Niklas Stoehr, Mitchell Gordon, Chiyuan Zhang, Owen Lewis

Abstract: Can we localize the weights and mechanisms used by a language model to memorize and recite entire paragraphs of its training data? In this paper, we show that while memorization is spread across multiple layers and model components, gradients of memorized paragraphs have a distinguishable spatial pattern, being larger in lower model layers than gradients of non-memorized examples. Moreover, the me… ▽ More Can we localize the weights and mechanisms used by a language model to memorize and recite entire paragraphs of its training data? In this paper, we show that while memorization is spread across multiple layers and model components, gradients of memorized paragraphs have a distinguishable spatial pattern, being larger in lower model layers than gradients of non-memorized examples. Moreover, the memorized examples can be unlearned by fine-tuning only the high-gradient weights. We localize a low-layer attention head that appears to be especially involved in paragraph memorization. This head is predominantly focusing its attention on distinctive, rare tokens that are least frequent in a corpus-level unigram distribution. Next, we study how localized memorization is across the tokens in the prefix by perturbing tokens and measuring the caused change in the decoding. A few distinctive tokens early in a prefix can often corrupt the entire continuation. Overall, memorized continuations are not only harder to unlearn, but also to corrupt than non-memorized ones. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2402.11156 [pdf, other]

Efficient Low-Rank Matrix Estimation, Experimental Design, and Arm-Set-Dependent Low-Rank Bandits

Authors: Kyoungseok Jang, Chicheng Zhang, Kwang-Sung Jun

Abstract: We study low-rank matrix trace regression and the related problem of low-rank matrix bandits. Assuming access to the distribution of the covariates, we propose a novel low-rank matrix estimation method called LowPopArt and provide its recovery guarantee that depends on a novel quantity denoted by B(Q) that characterizes the hardness of the problem, where Q is the covariance matrix of the measureme… ▽ More We study low-rank matrix trace regression and the related problem of low-rank matrix bandits. Assuming access to the distribution of the covariates, we propose a novel low-rank matrix estimation method called LowPopArt and provide its recovery guarantee that depends on a novel quantity denoted by B(Q) that characterizes the hardness of the problem, where Q is the covariance matrix of the measurement distribution. We show that our method can provide tighter recovery guarantees than classical nuclear norm penalized least squares (Koltchinskii et al., 2011) in several problems. To perform efficient estimation with a limited number of measurements from an arbitrarily given measurement set A, we also propose a novel experimental design criterion that minimizes B(Q) with computational efficiency. We leverage our novel estimator and design of experiments to derive two low-rank linear bandit algorithms for general arm sets that enjoy improved regret upper bounds. This improves over previous works on low-rank bandits, which make somewhat restrictive assumptions that the arm set is the unit ball or that an efficient exploration distribution is given. To our knowledge, our experimental design criterion is the first one tailored to low-rank matrix estimation beyond the naive reduction to linear regression, which can be of independent interest. △ Less

Submitted 8 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

arXiv:2401.17504 [pdf, other]

CaMU: Disentangling Causal Effects in Deep Model Unlearning

Authors: Shaofei Shen, Chenhao Zhang, Alina Bialkowski, Weitong Chen, Miao Xu

Abstract: Machine unlearning requires removing the information of forgetting data while kee** the necessary information of remaining data. Despite recent advancements in this area, existing methodologies mainly focus on the effect of removing forgetting data without considering the negative impact this can have on the information of the remaining data, resulting in significant performance degradation afte… ▽ More Machine unlearning requires removing the information of forgetting data while kee** the necessary information of remaining data. Despite recent advancements in this area, existing methodologies mainly focus on the effect of removing forgetting data without considering the negative impact this can have on the information of the remaining data, resulting in significant performance degradation after data removal. Although some methods try to repair the performance of remaining data after removal, the forgotten information can also return after repair. Such an issue is due to the intricate intertwining of the forgetting and remaining data. Without adequately differentiating the influence of these two kinds of data on the model, existing algorithms take the risk of either inadequate removal of the forgetting data or unnecessary loss of valuable information from the remaining data. To address this shortcoming, the present study undertakes a causal analysis of the unlearning and introduces a novel framework termed Causal Machine Unlearning (CaMU). This framework adds intervention on the information of remaining data to disentangle the causal effects between forgetting data and remaining data. Then CaMU eliminates the causal impact associated with forgetting data while concurrently preserving the causal relevance of the remaining data. Comprehensive empirical results on various datasets and models suggest that CaMU enhances performance on the remaining data and effectively minimizes the influences of forgetting data. Notably, this work is the first to interpret deep model unlearning tasks from a new perspective of causality and provide a solution based on causal analysis, which opens up new possibilities for future research in deep model unlearning. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: Full version of the paper accepted for the SDM 24 conference

arXiv:2312.07727 [pdf, other]

Two-sample inference for sparse functional data

Authors: Chi Zhang, Peijun Sang, Yingli Qin

Abstract: We propose a novel test procedure for comparing mean functions across two groups within the reproducing kernel Hilbert space (RKHS) framework. Our proposed method is adept at handling sparsely and irregularly sampled functional data when observation times are random for each subject. Conventional approaches, which are built upon functional principal components analysis, usually assume a homogeneou… ▽ More We propose a novel test procedure for comparing mean functions across two groups within the reproducing kernel Hilbert space (RKHS) framework. Our proposed method is adept at handling sparsely and irregularly sampled functional data when observation times are random for each subject. Conventional approaches, which are built upon functional principal components analysis, usually assume a homogeneous covariance structure across groups. Nonetheless, justifying this assumption in real-world scenarios can be challenging. To eliminate the need for a homogeneous covariance structure, we first develop the functional Bahadur representation for the mean estimator under the RKHS framework; this representation naturally leads to the desirable pointwise limiting distributions. Moreover, we establish weak convergence for the mean estimator, allowing us to construct a test statistic for the mean difference. Our method is easily implementable and outperforms some conventional tests in controlling type I errors across various settings. We demonstrate the finite sample performance of our approach through extensive simulations and two real-world applications. △ Less

Submitted 29 December, 2023; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2311.12293 [pdf]

Sample size calculation based on the difference in restricted mean time lost for clinical trials with competing risks

Authors: Xiang Geng, Zhao** Li, Chengfeng Zhang, Yanjie Wang, Haoning Shen, Zhiheng Huang, Yawen Hou, Zheng Chen

Abstract: Computation of sample size is important when designing clinical trials. The presence of competing risks makes the design of clinical trials with time-to-event endpoints cumbersome. A model based on the subdistribution hazard ratio (SHR) is commonly used for trials under competing risks. However, this approach has some limitations related to model assumptions and clinical interpretation. Considerin… ▽ More Computation of sample size is important when designing clinical trials. The presence of competing risks makes the design of clinical trials with time-to-event endpoints cumbersome. A model based on the subdistribution hazard ratio (SHR) is commonly used for trials under competing risks. However, this approach has some limitations related to model assumptions and clinical interpretation. Considering such limitations, the difference in restricted mean time lost (RMTLd) is recommended as an alternative indicator. In this paper, we propose a sample size calculation method based on the RMTLd for the Weibull distribution (RMTLdWeibull) for clinical trials, which considers experimental conditions such as equal allocation, uniform accrual, uniform loss to follow-up, and administrative censoring. Simulation results show that sample size calculation based on the RMTLdWeibull can generally achieve a predefined power level and maintain relative robustness. Moreover, the performance of the sample size calculation based on the RMTLdWeibull is similar or superior to that based on the SHR. Even if the event time does not follow the Weibull distribution, the sample size calculation based on the RMTLdWeibull still performs well. The results also verify the performance of the sample size calculation method based on the RMTLdWeibull. From the perspective of the results of this study, clinical interpretation, application conditions and statistical performance, we recommend that when designing clinical trials in the presence of competing risks, the RMTLd indicator be applied for sample size calculation and subsequent effect size measurement. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.11563 [pdf]

Time-varying effect in the competing risks based on restricted mean time lost

Authors: Zhiyin Yu, Zhao** Li, Chengfeng Zhang, Yawen Hou, Derun Zhou, Zheng Chen

Abstract: Patients with breast cancer tend to die from other diseases, so for studies that focus on breast cancer, a competing risks model is more appropriate. Considering subdistribution hazard ratio, which is used often, limited to model assumptions and clinical interpretation, we aimed to quantify the effects of prognostic factors by an absolute indicator, the difference in restricted mean time lost (RMT… ▽ More Patients with breast cancer tend to die from other diseases, so for studies that focus on breast cancer, a competing risks model is more appropriate. Considering subdistribution hazard ratio, which is used often, limited to model assumptions and clinical interpretation, we aimed to quantify the effects of prognostic factors by an absolute indicator, the difference in restricted mean time lost (RMTL), which is more intuitive. Additionally, prognostic factors may have dynamic effects (time-varying effects) in long-term follow-up. However, existing competing risks regression models only provide a static view of covariate effects, leading to a distorted assessment of the prognostic factor. To address this issue, we proposed a dynamic effect RMTL regression that can explore the between-group cumulative difference in mean life lost over a period of time and obtain the real-time effect by the speed of accumulation, as well as personalized predictions on a time scale. Through Monte Carlo simulation, we validated the dynamic effects estimated by the proposed regression having low bias and a coverage rate of around 95%. Applying this model to an elderly early-stage breast cancer cohort, we found that most factors had different patterns of dynamic effects, revealing meaningful physiological mechanisms underlying diseases. Moreover, from the perspective of prediction, the mean C-index in external validation reached 0.78. Dynamic effect RMTL regression can analyze both dynamic cumulative effects and real-time effects of covariates, providing a more comprehensive prognosis and better prediction when competing risks exist. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.03989 [pdf, other]

Learned Causal Method Prediction

Authors: Shantanu Gupta, Cheng Zhang, Agrin Hilmkil

Abstract: For a given causal question, it is important to efficiently decide which causal inference method to use for a given dataset. This is challenging because causal methods typically rely on complex and difficult-to-verify assumptions, and cross-validation is not applicable since ground truth causal quantities are unobserved. In this work, we propose CAusal Method Predictor (CAMP), a framework for pred… ▽ More For a given causal question, it is important to efficiently decide which causal inference method to use for a given dataset. This is challenging because causal methods typically rely on complex and difficult-to-verify assumptions, and cross-validation is not applicable since ground truth causal quantities are unobserved. In this work, we propose CAusal Method Predictor (CAMP), a framework for predicting the best method for a given dataset. To this end, we generate datasets from a diverse set of synthetic causal models, score the candidate methods, and train a model to directly predict the highest-scoring method for that dataset. Next, by formulating a self-supervised pre-training objective centered on dataset assumptions relevant for causal inference, we significantly reduce the need for costly labeled data and enhance training efficiency. Our strategy learns to map implicit dataset properties to the best method in a data-driven manner. In our experiments, we focus on method prediction for causal discovery. CAMP outperforms selecting any individual candidate method and demonstrates promising generalization to unseen semi-synthetic and real-world benchmarks. △ Less

Submitted 8 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

arXiv:2311.01902 [pdf, other]

High Precision Causal Model Evaluation with Conditional Randomization

Authors: Chao Ma, Cheng Zhang

Abstract: The gold standard for causal model evaluation involves comparing model predictions with true effects estimated from randomized controlled trials (RCT). However, RCTs are not always feasible or ethical to perform. In contrast, conditionally randomized experiments based on inverse probability weighting (IPW) offer a more realistic approach but may suffer from high estimation variance. To tackle this… ▽ More The gold standard for causal model evaluation involves comparing model predictions with true effects estimated from randomized controlled trials (RCT). However, RCTs are not always feasible or ethical to perform. In contrast, conditionally randomized experiments based on inverse probability weighting (IPW) offer a more realistic approach but may suffer from high estimation variance. To tackle this challenge and enhance causal model evaluation in real-world conditional randomization settings, we introduce a novel low-variance estimator for causal error, dubbed as the pairs estimator. By applying the same IPW estimator to both the model and true experimental effects, our estimator effectively cancels out the variance due to IPW and achieves a smaller asymptotic variance. Empirical studies demonstrate the improved of our estimator, highlighting its potential on achieving near-RCT performance. Our method offers a simple yet powerful solution to evaluate causal inference models in conditional randomization settings without complicated modification of the IPW estimator itself, paving the way for more robust and reliable model assessments. △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: NeurIPS 2023 Camera Ready version

arXiv:2310.20224 [pdf, other]

Choose A Table: Tensor Dirichlet Process Multinomial Mixture Model with Graphs for Passenger Trajectory Clustering

Authors: Ziyue Li, Hao Yan, Chen Zhang, Lijun Sun, Wolfgang Ketter, Fugee Tsung

Abstract: Passenger clustering based on trajectory records is essential for transportation operators. However, existing methods cannot easily cluster the passengers due to the hierarchical structure of the passenger trip information, including multiple trips within each passenger and multi-dimensional information about each trip. Furthermore, existing approaches rely on an accurate specification of the clus… ▽ More Passenger clustering based on trajectory records is essential for transportation operators. However, existing methods cannot easily cluster the passengers due to the hierarchical structure of the passenger trip information, including multiple trips within each passenger and multi-dimensional information about each trip. Furthermore, existing approaches rely on an accurate specification of the clustering number to start. Finally, existing methods do not consider spatial semantic graphs such as geographical proximity and functional similarity between the locations. In this paper, we propose a novel tensor Dirichlet Process Multinomial Mixture model with graphs, which can preserve the hierarchical structure of the multi-dimensional trip information and cluster them in a unified one-step manner with the ability to determine the number of clusters automatically. The spatial graphs are utilized in community detection to link the semantic neighbors. We further propose a tensor version of Collapsed Gibbs Sampling method with a minimum cluster size requirement. A case study based on Hong Kong metro passenger data is conducted to demonstrate the automatic process of cluster amount evolution and better cluster quality measured by within-cluster compactness and cross-cluster separateness. The code is available at https://github.com/bonaldli/TensorDPMM-G. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: Accepted in ACM SIGSPATIAL 2023. arXiv admin note: substantial text overlap with arXiv:2306.13794

arXiv:2310.17153 [pdf, other]

Hierarchical Semi-Implicit Variational Inference with Application to Diffusion Model Acceleration

Authors: Longlin Yu, Tianyu Xie, Yu Zhu, Tong Yang, Xiangyu Zhang, Cheng Zhang

Abstract: Semi-implicit variational inference (SIVI) has been introduced to expand the analytical variational families by defining expressive semi-implicit distributions in a hierarchical manner. However, the single-layer architecture commonly used in current SIVI methods can be insufficient when the target posterior has complicated structures. In this paper, we propose hierarchical semi-implicit variationa… ▽ More Semi-implicit variational inference (SIVI) has been introduced to expand the analytical variational families by defining expressive semi-implicit distributions in a hierarchical manner. However, the single-layer architecture commonly used in current SIVI methods can be insufficient when the target posterior has complicated structures. In this paper, we propose hierarchical semi-implicit variational inference, called HSIVI, which generalizes SIVI to allow more expressive multi-layer construction of semi-implicit distributions. By introducing auxiliary distributions that interpolate between a simple base distribution and the target distribution, the conditional layers can be trained by progressively matching these auxiliary distributions one layer after another. Moreover, given pre-trained score networks, HSIVI can be used to accelerate the sampling process of diffusion models with the score matching objective. We show that HSIVI significantly enhances the expressiveness of SIVI on several Bayesian inference problems with complicated target distributions. When used for diffusion model acceleration, we show that HSIVI can produce high quality samples comparable to or better than the existing fast diffusion model based samplers with a small number of function evaluations on various datasets. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: 25 pages, 13 figures, NeurIPS 2023

arXiv:2310.16516 [pdf, other]

Particle-based Variational Inference with Generalized Wasserstein Gradient Flow

Authors: Ziheng Cheng, Shiyue Zhang, Longlin Yu, Cheng Zhang

Abstract: Particle-based variational inference methods (ParVIs) such as Stein variational gradient descent (SVGD) update the particles based on the kernelized Wasserstein gradient flow for the Kullback-Leibler (KL) divergence. However, the design of kernels is often non-trivial and can be restrictive for the flexibility of the method. Recent works show that functional gradient flow approximations with quadr… ▽ More Particle-based variational inference methods (ParVIs) such as Stein variational gradient descent (SVGD) update the particles based on the kernelized Wasserstein gradient flow for the Kullback-Leibler (KL) divergence. However, the design of kernels is often non-trivial and can be restrictive for the flexibility of the method. Recent works show that functional gradient flow approximations with quadratic form regularization terms can improve performance. In this paper, we propose a ParVI framework, called generalized Wasserstein gradient descent (GWG), based on a generalized Wasserstein gradient flow of the KL divergence, which can be viewed as a functional gradient method with a broader class of regularizers induced by convex functions. We show that GWG exhibits strong convergence guarantees. We also provide an adaptive version that automatically chooses Wasserstein metric to accelerate convergence. In experiments, we demonstrate the effectiveness and efficiency of the proposed framework on both simulated and real data problems. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2310.16428 [pdf, ps, other]

Similarity-driven and Task-driven Models for Diversity of Opinion in Crowdsourcing Markets

Authors: Chen Jason Zhang, Yunrui Liu, Pengcheng Zeng, Ting Wu, Lei Chen, Pan Hui, Fei Hao

Abstract: The recent boom in crowdsourcing has opened up a new avenue for utilizing human intelligence in the realm of data analysis. This innovative approach provides a powerful means for connecting online workers to tasks that cannot effectively be done solely by machines or conducted by professional experts due to cost constraints. Within the field of social science, four elements are required to constru… ▽ More The recent boom in crowdsourcing has opened up a new avenue for utilizing human intelligence in the realm of data analysis. This innovative approach provides a powerful means for connecting online workers to tasks that cannot effectively be done solely by machines or conducted by professional experts due to cost constraints. Within the field of social science, four elements are required to construct a sound crowd - Diversity of Opinion, Independence, Decentralization and Aggregation. However, while the other three components have already been investigated and implemented in existing crowdsourcing platforms, 'Diversity of Opinion' has not been functionally enabled yet. From a computational point of view, constructing a wise crowd necessitates quantitatively modeling and taking diversity into account. There are usually two paradigms in a crowdsourcing marketplace for worker selection: building a crowd to wait for tasks to come and selecting workers for a given task. We propose similarity-driven and task-driven models for both paradigms. Also, we develop efficient and effective algorithms for recruiting a limited number of workers with optimal diversity in both models. To validate our solutions, we conduct extensive experiments using both synthetic datasets and real data sets. △ Less

Submitted 28 February, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

Comments: 37 pages, 11 figures

arXiv:2310.16336 [pdf, other]

SMURF-THP: Score Matching-based UnceRtainty quantiFication for Transformer Hawkes Process

Authors: Zichong Li, Yanbo Xu, Simiao Zuo, Haoming Jiang, Chao Zhang, Tuo Zhao, Hongyuan Zha

Abstract: Transformer Hawkes process models have shown to be successful in modeling event sequence data. However, most of the existing training methods rely on maximizing the likelihood of event sequences, which involves calculating some intractable integral. Moreover, the existing methods fail to provide uncertainty quantification for model predictions, e.g., confidence intervals for the predicted event's… ▽ More Transformer Hawkes process models have shown to be successful in modeling event sequence data. However, most of the existing training methods rely on maximizing the likelihood of event sequences, which involves calculating some intractable integral. Moreover, the existing methods fail to provide uncertainty quantification for model predictions, e.g., confidence intervals for the predicted event's arrival time. To address these issues, we propose SMURF-THP, a score-based method for learning Transformer Hawkes process and quantifying prediction uncertainty. Specifically, SMURF-THP learns the score function of events' arrival time based on a score-matching objective that avoids the intractable computation. With such a learned score function, we can sample arrival time of events from the predictive distribution. This naturally allows for the quantification of uncertainty by computing confidence intervals over the generated samples. We conduct extensive experiments in both event type prediction and uncertainty quantification of arrival time. In all the experiments, SMURF-THP outperforms existing likelihood-based methods in confidence calibration while exhibiting comparable prediction accuracy. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.15411 [pdf, ps, other]

Efficient Active Learning Halfspaces with Tsybakov Noise: A Non-convex Optimization Approach

Authors: Yinan Li, Chicheng Zhang

Abstract: We study the problem of computationally and label efficient PAC active learning $d$-dimensional halfspaces with Tsybakov Noise~\citep{tsybakov2004optimal} under structured unlabeled data distributions. Inspired by~\cite{diakonikolas2020learning}, we prove that any approximate first-order stationary point of a smooth nonconvex loss function yields a halfspace with a low excess error guarantee. In l… ▽ More We study the problem of computationally and label efficient PAC active learning $d$-dimensional halfspaces with Tsybakov Noise~\citep{tsybakov2004optimal} under structured unlabeled data distributions. Inspired by~\cite{diakonikolas2020learning}, we prove that any approximate first-order stationary point of a smooth nonconvex loss function yields a halfspace with a low excess error guarantee. In light of the above structural result, we design a nonconvex optimization-based algorithm with a label complexity of $\tilde{O}(d (\frac{1}ε)^{\frac{8-6α}{3α-1}})$\footnote{In the main body of this work, we use $\tilde{O}(\cdot), \tildeΘ(\cdot)$ to hide factors of the form $\polylog(d, \frac{1}ε, \frac{1}δ)$}, under the assumption that the Tsybakov noise parameter $α\in (\frac13, 1]$, which narrows down the gap between the label complexities of the previously known efficient passive or active algorithms~\citep{diakonikolas2020polynomial,zhang2021improved} and the information-theoretic lower bound in this setting. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: 29 pages

arXiv:2310.11428 [pdf, other]

Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning and Autoregression

Authors: Adam Block, Dylan J. Foster, Akshay Krishnamurthy, Max Simchowitz, Cyril Zhang

Abstract: This work studies training instabilities of behavior cloning with deep neural networks. We observe that minibatch SGD updates to the policy network during training result in sharp oscillations in long-horizon rewards, despite negligibly affecting the behavior cloning loss. We empirically disentangle the statistical and computational causes of these oscillations, and find them to stem from the chao… ▽ More This work studies training instabilities of behavior cloning with deep neural networks. We observe that minibatch SGD updates to the policy network during training result in sharp oscillations in long-horizon rewards, despite negligibly affecting the behavior cloning loss. We empirically disentangle the statistical and computational causes of these oscillations, and find them to stem from the chaotic propagation of minibatch SGD noise through unstable closed-loop dynamics. While SGD noise is benign in the single-step action prediction objective, it results in catastrophic error accumulation over long horizons, an effect we term gradient variance amplification (GVA). We show that many standard mitigation techniques do not alleviate GVA, but find an exponential moving average (EMA) of iterates to be surprisingly effective at doing so. We illustrate the generality of this phenomenon by showing the existence of GVA and its amelioration by EMA in both continuous control and autoregressive language generation. Finally, we provide theoretical vignettes that highlight the benefits of EMA in alleviating GVA and shed light on the extent to which classical convex models can help in understanding the benefits of iterate averaging in deep learning. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.09553 [pdf, other]

ARTree: A Deep Autoregressive Model for Phylogenetic Inference

Authors: Tianyu Xie, Cheng Zhang

Abstract: Designing flexible probabilistic models over tree topologies is important for develo** efficient phylogenetic inference methods. To do that, previous works often leverage the similarity of tree topologies via hand-engineered heuristic features which would require pre-sampled tree topologies and may suffer from limited approximation capability. In this paper, we propose a deep autoregressive mode… ▽ More Designing flexible probabilistic models over tree topologies is important for develo** efficient phylogenetic inference methods. To do that, previous works often leverage the similarity of tree topologies via hand-engineered heuristic features which would require pre-sampled tree topologies and may suffer from limited approximation capability. In this paper, we propose a deep autoregressive model for phylogenetic inference based on graph neural networks (GNNs), called ARTree. By decomposing a tree topology into a sequence of leaf node addition operations and modeling the involved conditional distributions based on learnable topological features via GNNs, ARTree can provide a rich family of distributions over the entire tree topology space that have simple sampling algorithms and density estimation procedures, without using heuristic features. We demonstrate the effectiveness and efficiency of our method on a benchmark of challenging real data tree topology density estimation and variational Bayesian phylogenetic inference problems. △ Less

Submitted 14 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023 spotlight

arXiv:2310.08843 [pdf]

A Longitudinal Analysis about the Effect of Air Pollution on Astigmatism for Children and Young Adults

Authors: Lin An, Qiuyue Hu, Jieying Guan, Yingting Zhu, Chenyao Jiang, Xiaoyun Zhong, Shuyue Ma, Dongmei Yu, Canyang Zhang, Yehong Zhuo, Peiwu Qin

Abstract: Purpose: This study aimed to investigate the correlation between air pollution and astigmatism, considering the detrimental effects of air pollution on respiratory, cardiovascular, and eye health. Methods: A longitudinal study was conducted with 127,709 individuals aged 4-27 years from 9 cities in Guangdong Province, China, spanning from 2019 to 2021. Astigmatism was measured using cylinder values… ▽ More Purpose: This study aimed to investigate the correlation between air pollution and astigmatism, considering the detrimental effects of air pollution on respiratory, cardiovascular, and eye health. Methods: A longitudinal study was conducted with 127,709 individuals aged 4-27 years from 9 cities in Guangdong Province, China, spanning from 2019 to 2021. Astigmatism was measured using cylinder values. Multiple measurements were taken at intervals of at least 1 year. Various exposure windows were used to assess the lagged impacts of air pollution on astigmatism. A panel data model with random effects was constructed to analyze the relationship between pollutant exposure and astigmatism. Results: The study revealed significant associations between astigmatism and exposure to carbon monoxide (CO), nitrogen dioxide (NO2), and particulate matter (PM2.5) over time. A 10 μg/m3 increase in a 3-year exposure window of NO2 and PM2.5 was associated with a decrease in cylinder value of -0.045 diopters and -0.017 diopters, respectively. A 0.1 mg/m3 increase in CO concentration within a 2-year exposure window correlated with a decrease in cylinder value of -0.009 diopters. No significant relationships were found between PM10 exposure and astigmatism. Conclusion: This study concluded that greater exposure to NO2 and PM2.5 over longer periods aggravates astigmatism. The negative effect of CO on astigmatism peaks in the exposure window of 2 years prior to examination and diminishes afterward. No significant association was found between PM10 exposure and astigmatism, suggesting that gaseous and smaller particulate pollutants have easier access to human eyes, causing heterogeneous morphological changes to the eyeball. △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2310.00809 [pdf, other]

Towards Causal Foundation Model: on Duality between Causal Inference and Attention

Authors: Jiaqi Zhang, Joel Jennings, Agrin Hilmkil, Nick Pawlowski, Cheng Zhang, Chao Ma

Abstract: Foundation models have brought changes to the landscape of machine learning, demonstrating sparks of human-level intelligence across a diverse array of tasks. However, a gap persists in complex tasks such as causal inference, primarily due to challenges associated with intricate reasoning steps and high numerical precision requirements. In this work, we take a first step towards building causally-… ▽ More Foundation models have brought changes to the landscape of machine learning, demonstrating sparks of human-level intelligence across a diverse array of tasks. However, a gap persists in complex tasks such as causal inference, primarily due to challenges associated with intricate reasoning steps and high numerical precision requirements. In this work, we take a first step towards building causally-aware foundation models for treatment effect estimations. We propose a novel, theoretically justified method called Causal Inference with Attention (CInA), which utilizes multiple unlabeled datasets to perform self-supervised causal learning, and subsequently enables zero-shot causal inference on unseen tasks with new data. This is based on our theoretical results that demonstrate the primal-dual connection between optimal covariate balancing and self-attention, facilitating zero-shot causal inference through the final layer of a trained transformer-type architecture. We demonstrate empirically that CInA effectively generalizes to out-of-distribution datasets and various real-world datasets, matching or even surpassing traditional per-dataset methodologies. These results provide compelling evidence that our method has the potential to serve as a step** stone for the development of causal foundation models. △ Less

Submitted 3 June, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

arXiv:2309.03800 [pdf, other]

Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck

Authors: Benjamin L. Edelman, Surbhi Goel, Sham Kakade, Eran Malach, Cyril Zhang

Abstract: In modern deep learning, algorithmic choices (such as width, depth, and learning rate) are known to modulate nuanced resource tradeoffs. This work investigates how these complexities necessarily arise for feature learning in the presence of computational-statistical gaps. We begin by considering offline sparse parity learning, a supervised classification problem which admits a statistical query lo… ▽ More In modern deep learning, algorithmic choices (such as width, depth, and learning rate) are known to modulate nuanced resource tradeoffs. This work investigates how these complexities necessarily arise for feature learning in the presence of computational-statistical gaps. We begin by considering offline sparse parity learning, a supervised classification problem which admits a statistical query lower bound for gradient-based training of a multilayer perceptron. This lower bound can be interpreted as a multi-resource tradeoff frontier: successful learning can only occur if one is sufficiently rich (large model), knowledgeable (large dataset), patient (many training iterations), or lucky (many random guesses). We show, theoretically and experimentally, that sparse initialization and increasing network width yield significant improvements in sample efficiency in this setting. Here, width plays the role of parallel search: it amplifies the probability of finding "lottery ticket" neurons, which learn sparse features more sample-efficiently. Finally, we show that the synthetic sparse parity task can be useful as a proxy for real problems requiring axis-aligned feature learning. We demonstrate improved sample efficiency on tabular classification benchmarks by using wide, sparsely-initialized MLP models; these networks sometimes outperform tuned random forests. △ Less

Submitted 30 October, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

Comments: v2: NeurIPS 2023 camera-ready updates

arXiv:2308.10014 [pdf, other]

Semi-Implicit Variational Inference via Score Matching

Authors: Longlin Yu, Cheng Zhang

Abstract: Semi-implicit variational inference (SIVI) greatly enriches the expressiveness of variational families by considering implicit variational distributions defined in a hierarchical manner. However, due to the intractable densities of variational distributions, current SIVI approaches often use surrogate evidence lower bounds (ELBOs) or employ expensive inner-loop MCMC runs for unbiased ELBOs for tra… ▽ More Semi-implicit variational inference (SIVI) greatly enriches the expressiveness of variational families by considering implicit variational distributions defined in a hierarchical manner. However, due to the intractable densities of variational distributions, current SIVI approaches often use surrogate evidence lower bounds (ELBOs) or employ expensive inner-loop MCMC runs for unbiased ELBOs for training. In this paper, we propose SIVI-SM, a new method for SIVI based on an alternative training objective via score matching. Leveraging the hierarchical structure of semi-implicit variational families, the score matching objective allows a minimax formulation where the intractable variational densities can be naturally handled with denoising score matching. We show that SIVI-SM closely matches the accuracy of MCMC and outperforms ELBO-based SIVI methods in a variety of Bayesian inference tasks. △ Less

Submitted 19 August, 2023; originally announced August 2023.

Comments: 17 pages, 8 figures; ICLR 2023

arXiv:2308.04428 [pdf, other]

Meta-Learning Operators to Optimality from Multi-Task Non-IID Data

Authors: Thomas T. C. K. Zhang, Leonardo F. Toso, James Anderson, Nikolai Matni

Abstract: A powerful concept behind much of the recent progress in machine learning is the extraction of common features across data from heterogeneous sources or tasks. Intuitively, using all of one's data to learn a common representation function benefits both computational effort and statistical generalization by leaving a smaller number of parameters to fine-tune on a given task. Toward theoretically gr… ▽ More A powerful concept behind much of the recent progress in machine learning is the extraction of common features across data from heterogeneous sources or tasks. Intuitively, using all of one's data to learn a common representation function benefits both computational effort and statistical generalization by leaving a smaller number of parameters to fine-tune on a given task. Toward theoretically grounding these merits, we propose a general setting of recovering linear operators $M$ from noisy vector measurements $y = Mx + w$, where the covariates $x$ may be both non-i.i.d. and non-isotropic. We demonstrate that existing isotropy-agnostic meta-learning approaches incur biases on the representation update, which causes the scaling of the noise terms to lose favorable dependence on the number of source tasks. This in turn can cause the sample complexity of representation learning to be bottlenecked by the single-task data size. We introduce an adaptation, $\texttt{De-bias & Feature-Whiten}$ ($\texttt{DFW}$), of the popular alternating minimization-descent (AMD) scheme proposed in Collins et al., (2021), and establish linear convergence to the optimal representation with noise level scaling down with the $\textit{total}$ source data size. This leads to generalization bounds on the same order as an oracle empirical risk minimizer. We verify the vital importance of $\texttt{DFW}$ on various numerical simulations. In particular, we show that vanilla alternating-minimization descent fails catastrophically even for iid, but mildly non-isotropic data. Our analysis unifies and generalizes prior work, and provides a flexible framework for a wider range of applications, such as in controls and dynamical systems. △ Less

Submitted 8 August, 2023; originally announced August 2023.

arXiv:2307.13917 [pdf, other]

BayesDAG: Gradient-Based Posterior Inference for Causal Discovery

Authors: Yashas Annadani, Nick Pawlowski, Joel Jennings, Stefan Bauer, Cheng Zhang, Wenbo Gong

Abstract: Bayesian causal discovery aims to infer the posterior distribution over causal models from observed data, quantifying epistemic uncertainty and benefiting downstream tasks. However, computational challenges arise due to joint inference over combinatorial space of Directed Acyclic Graphs (DAGs) and nonlinear functions. Despite recent progress towards efficient posterior inference over DAGs, existin… ▽ More Bayesian causal discovery aims to infer the posterior distribution over causal models from observed data, quantifying epistemic uncertainty and benefiting downstream tasks. However, computational challenges arise due to joint inference over combinatorial space of Directed Acyclic Graphs (DAGs) and nonlinear functions. Despite recent progress towards efficient posterior inference over DAGs, existing methods are either limited to variational inference on node permutation matrices for linear causal models, leading to compromised inference accuracy, or continuous relaxation of adjacency matrices constrained by a DAG regularizer, which cannot ensure resulting graphs are DAGs. In this work, we introduce a scalable Bayesian causal discovery framework based on a combination of stochastic gradient Markov Chain Monte Carlo (SG-MCMC) and Variational Inference (VI) that overcomes these limitations. Our approach directly samples DAGs from the posterior without requiring any DAG regularization, simultaneously draws function parameter samples and is applicable to both linear and nonlinear causal models. To enable our approach, we derive a novel equivalence to the permutation-based DAG learning, which opens up possibilities of using any relaxed gradient estimator defined over permutations. To our knowledge, this is the first framework applying gradient-based MCMC sampling for causal discovery. Empirical evaluation on synthetic and real-world datasets demonstrate our approach's effectiveness compared to state-of-the-art baselines. △ Less

Submitted 8 December, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

Comments: NeurIPS 2023

arXiv:2307.11685 [pdf, other]

Towards Generalizable Reinforcement Learning for Trade Execution

Authors: Chuheng Zhang, Yitong Duan, Xiaoyu Chen, Jianyu Chen, Jian Li, Li Zhao

Abstract: Optimized trade execution is to sell (or buy) a given amount of assets in a given time with the lowest possible trading cost. Recently, reinforcement learning (RL) has been applied to optimized trade execution to learn smarter policies from market data. However, we find that many existing RL methods exhibit considerable overfitting which prevents them from real deployment. In this paper, we provid… ▽ More Optimized trade execution is to sell (or buy) a given amount of assets in a given time with the lowest possible trading cost. Recently, reinforcement learning (RL) has been applied to optimized trade execution to learn smarter policies from market data. However, we find that many existing RL methods exhibit considerable overfitting which prevents them from real deployment. In this paper, we provide an extensive study on the overfitting problem in optimized trade execution. First, we model the optimized trade execution as offline RL with dynamic context (ORDC), where the context represents market variables that cannot be influenced by the trading policy and are collected in an offline manner. Under this framework, we derive the generalization bound and find that the overfitting issue is caused by large context space and limited context samples in the offline setting. Accordingly, we propose to learn compact representations for context to address the overfitting problem, either by leveraging prior knowledge or in an end-to-end manner. To evaluate our algorithms, we also implement a carefully designed simulator based on historical limit order book (LOB) data to provide a high-fidelity benchmark for different algorithms. Our experiments on the high-fidelity simulator demonstrate that our algorithms can effectively alleviate overfitting and achieve better performance. △ Less

Submitted 11 May, 2023; originally announced July 2023.

Comments: Accepted by IJCAI-23

arXiv:2307.07320 [pdf, other]

Adaptive Linear Estimating Equations

Authors: Mufang Ying, Koulik Khamaru, Cun-Hui Zhang

Abstract: Sequential data collection has emerged as a widely adopted technique for enhancing the efficiency of data gathering processes. Despite its advantages, such data collection mechanism often introduces complexities to the statistical inference procedure. For instance, the ordinary least squares (OLS) estimator in an adaptive linear regression model can exhibit non-normal asymptotic behavior, posing c… ▽ More Sequential data collection has emerged as a widely adopted technique for enhancing the efficiency of data gathering processes. Despite its advantages, such data collection mechanism often introduces complexities to the statistical inference procedure. For instance, the ordinary least squares (OLS) estimator in an adaptive linear regression model can exhibit non-normal asymptotic behavior, posing challenges for accurate inference and interpretation. In this paper, we propose a general method for constructing debiased estimator which remedies this issue. It makes use of the idea of adaptive linear estimating equations, and we establish theoretical guarantees of asymptotic normality, supplemented by discussions on achieving near-optimal asymptotic variance. A salient feature of our estimator is that in the context of multi-armed bandits, our estimator retains the non-asymptotic performance of the least square estimator while obtaining asymptotic normality property. Consequently, this work helps connect two fruitful paradigms of adaptive inference: a) non-asymptotic inference using concentration inequalities and b) asymptotic inference via asymptotic normality. △ Less

Submitted 7 November, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

Comments: Paper is accepted at NeurIPS 2023

arXiv:2307.07264 [pdf, ps, other]

On Interpolating Experts and Multi-Armed Bandits

Authors: Houshuang Chen, Yuchen He, Chihao Zhang

Abstract: Learning with expert advice and multi-armed bandit are two classic online decision problems which differ on how the information is observed in each round of the game. We study a family of problems interpolating the two. For a vector $\mathbf{m}=(m_1,\dots,m_K)\in \mathbb{N}^K$, an instance of $\mathbf{m}$-MAB indicates that the arms are partitioned into $K$ groups and the $i$-th group contains… ▽ More Learning with expert advice and multi-armed bandit are two classic online decision problems which differ on how the information is observed in each round of the game. We study a family of problems interpolating the two. For a vector $\mathbf{m}=(m_1,\dots,m_K)\in \mathbb{N}^K$, an instance of $\mathbf{m}$-MAB indicates that the arms are partitioned into $K$ groups and the $i$-th group contains $m_i$ arms. Once an arm is pulled, the losses of all arms in the same group are observed. We prove tight minimax regret bounds for $\mathbf{m}$-MAB and design an optimal PAC algorithm for its pure exploration version, $\mathbf{m}$-BAI, where the goal is to identify the arm with minimum loss with as few rounds as possible. We show that the minimax regret of $\mathbf{m}$-MAB is $Θ\left(\sqrt{T\sum_{k=1}^K\log (m_k+1)}\right)$ and the minimum number of pulls for an $(ε,0.05)$-PAC algorithm of $\mathbf{m}$-BAI is $Θ\left(\frac{1}{ε^2}\cdot \sum_{k=1}^K\log (m_k+1)\right)$. Both our upper bounds and lower bounds for $\mathbf{m}$-MAB can be extended to a more general setting, namely the bandit with graph feedback, in terms of the clique cover and related graph parameters. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs. △ Less

Submitted 4 August, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

arXiv:2307.07191 [pdf, other]

Benchmarks and Custom Package for Electrical Load Forecasting

Authors: Zhixian Wang, Qingsong Wen, Chaoli Zhang, Liang Sun, Leandro Von Krannichfeldt, Yi Wang

Abstract: Load forecasting is of great significance in the power industry as it can provide a reference for subsequent tasks such as power grid dispatch, thus bringing huge economic benefits. However, there are many differences between load forecasting and traditional time series forecasting. On the one hand, load forecasting aims to minimize the cost of subsequent tasks such as power grid dispatch, rather… ▽ More Load forecasting is of great significance in the power industry as it can provide a reference for subsequent tasks such as power grid dispatch, thus bringing huge economic benefits. However, there are many differences between load forecasting and traditional time series forecasting. On the one hand, load forecasting aims to minimize the cost of subsequent tasks such as power grid dispatch, rather than simply pursuing prediction accuracy. On the other hand, the load is largely influenced by many external factors, such as temperature or calendar variables. In addition, the scale of predictions (such as building-level loads and aggregated-level loads) can also significantly impact the predicted results. In this paper, we provide a comprehensive load forecasting archive, which includes load domain-specific feature engineering to help forecasting models better model load data. In addition, different from the traditional loss function which only aims for accuracy, we also provide a method to customize the loss function based on the forecasting error, integrating it into our forecasting framework. Based on this, we conducted extensive experiments on load data at different levels, providing a reference for researchers to compare different load forecasting models. △ Less

Submitted 14 July, 2023; originally announced July 2023.

arXiv:2307.04226 [pdf, other]

Seismic Data Interpolation based on Denoising Diffusion Implicit Models with Resampling

Authors: Xiaoli Wei, Chunxia Zhang, Hongtao Wang, Chengli Tan, Deng Xiong, Baisong Jiang, Jiangshe Zhang, Sang-Woon Kim

Abstract: The incompleteness of the seismic data caused by missing traces along the spatial extension is a common issue in seismic acquisition due to the existence of obstacles and economic constraints, which severely impairs the imaging quality of subsurface geological structures. Recently, deep learningbased seismic interpolation methods have attained promising progress, while achieving stable training of… ▽ More The incompleteness of the seismic data caused by missing traces along the spatial extension is a common issue in seismic acquisition due to the existence of obstacles and economic constraints, which severely impairs the imaging quality of subsurface geological structures. Recently, deep learningbased seismic interpolation methods have attained promising progress, while achieving stable training of generative adversarial networks is not easy, and performance degradation is usually notable if the missing patterns in the testing and training do not match. In this paper, we propose a novel seismic denoising diffusion implicit model with resampling. The model training is established on the denoising diffusion probabilistic model, where U-Net is equipped with the multi-head self-attention to match the noise in each step. The cosine noise schedule, serving as the global noise configuration, promotes the high utilization of known trace information by accelerating the passage of the excessive noise stages. The model inference utilizes the denoising diffusion implicit model, conditioning on the known traces, to enable high-quality interpolation with fewer diffusion steps. To enhance the coherency between the known traces and the missing traces within each reverse step, the inference process integrates a resampling strategy to achieve an information recap on the former interpolated traces. Extensive experiments conducted on synthetic and field seismic data validate the superiority of our model and its robustness to various missing patterns. In addition, uncertainty quantification and ablation studies are also investigated. △ Less

Submitted 13 July, 2023; v1 submitted 9 July, 2023; originally announced July 2023.

Comments: 14 pages, 13 figures

arXiv:2307.03340 [pdf, other]

Calibrating Car-Following Models via Bayesian Dynamic Regression

Authors: Chengyuan Zhang, Wenshuo Wang, Lijun Sun

Abstract: Car-following behavior modeling is critical for understanding traffic flow dynamics and develo** high-fidelity microscopic simulation models. Most existing impulse-response car-following models prioritize computational efficiency and interpretability by using a parsimonious nonlinear function based on immediate preceding state observations. However, this approach disregards historical informatio… ▽ More Car-following behavior modeling is critical for understanding traffic flow dynamics and develo** high-fidelity microscopic simulation models. Most existing impulse-response car-following models prioritize computational efficiency and interpretability by using a parsimonious nonlinear function based on immediate preceding state observations. However, this approach disregards historical information, limiting its ability to explain real-world driving data. Consequently, serially correlated residuals are commonly observed when calibrating these models with actual trajectory data, hindering their ability to capture complex and stochastic phenomena. To address this limitation, we propose a dynamic regression framework incorporating time series models, such as autoregressive processes, to capture error dynamics. This statistically rigorous calibration outperforms the simple assumption of independent errors and enables more accurate simulation and prediction by leveraging higher-order historical information. We validate the effectiveness of our framework using HighD and OpenACC data, demonstrating improved probabilistic simulations. In summary, our framework preserves the parsimonious nature of traditional car-following models while offering enhanced probabilistic simulations. The code of this work is available at https://github.com/Chengyuan-Zhang/IDM_Bayesian_Calibration. △ Less

Submitted 11 June, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

arXiv:2307.03034 [pdf, ps, other]

PCL-Indexability and Whittle Index for Restless Bandits with General Observation Models

Authors: Keqin Liu, Chengzhong Zhang

Abstract: In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player needs to be based on certain feedback mechanism that is error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of feedback/observation, we formulate the problem as a restless bandit with a coun… ▽ More In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player needs to be based on certain feedback mechanism that is error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of feedback/observation, we formulate the problem as a restless bandit with a countable belief state space starting from an arbitrary initial belief (a priori information). We apply the achievable region method with partial conservation law (PCL) to the infinite-state problem and analyze its indexability and priority index (Whittle index). Finally, we propose an approximation process to transform the problem into which the AG algorithm of Niño-Mora and Bertsimas for finite-state problems can be applied to. Numerical experiments show that our algorithm has an excellent performance. △ Less

Submitted 3 July, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

arXiv:2306.15744 [pdf, ps, other]

Ticketed Learning-Unlearning Schemes

Authors: Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Ayush Sekhari, Chiyuan Zhang

Abstract: We consider the learning--unlearning paradigm defined as follows. First given a dataset, the goal is to learn a good predictor, such as one minimizing a certain loss. Subsequently, given any subset of examples that wish to be unlearnt, the goal is to learn, without the knowledge of the original training dataset, a good predictor that is identical to the predictor that would have been produced when… ▽ More We consider the learning--unlearning paradigm defined as follows. First given a dataset, the goal is to learn a good predictor, such as one minimizing a certain loss. Subsequently, given any subset of examples that wish to be unlearnt, the goal is to learn, without the knowledge of the original training dataset, a good predictor that is identical to the predictor that would have been produced when learning from scratch on the surviving examples. We propose a new ticketed model for learning--unlearning wherein the learning algorithm can send back additional information in the form of a small-sized (encrypted) ``ticket'' to each participating training example, in addition to retaining a small amount of ``central'' information for later. Subsequently, the examples that wish to be unlearnt present their tickets to the unlearning algorithm, which additionally uses the central information to return a new predictor. We provide space-efficient ticketed learning--unlearning schemes for a broad family of concept classes, including thresholds, parities, intersection-closed classes, among others. En route, we introduce the count-to-zero problem, where during unlearning, the goal is to simply know if there are any examples that survived. We give a ticketed learning--unlearning scheme for this problem that relies on the construction of Sperner families with certain properties, which might be of independent interest. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: Conference on Learning Theory (COLT) 2023

arXiv:2306.14388 [pdf, other]

Nonlinear Functional Principal Component Analysis Using Neural Networks

Authors: Rou Zhong, Chunming Zhang, **gxiao Zhang

Abstract: Functional principal component analysis (FPCA) is an important technique for dimension reduction in functional data analysis (FDA). Classical FPCA method is based on the Karhunen-Loève expansion, which assumes a linear structure of the observed functional data. However, the assumption may not always be satisfied, and the FPCA method can become inefficient when the data deviates from the linear ass… ▽ More Functional principal component analysis (FPCA) is an important technique for dimension reduction in functional data analysis (FDA). Classical FPCA method is based on the Karhunen-Loève expansion, which assumes a linear structure of the observed functional data. However, the assumption may not always be satisfied, and the FPCA method can become inefficient when the data deviates from the linear assumption. In this paper, we propose a novel FPCA method that is suitable for data with a nonlinear structure by neural network approach. We construct networks that can be applied to functional data and explore the corresponding universal approximation property. The main use of our proposed nonlinear FPCA method is curve reconstruction. We conduct a simulation study to evaluate the performance of our method. The proposed method is also applied to two real-world data sets to further demonstrate its superiority. △ Less

Submitted 25 June, 2023; originally announced June 2023.

arXiv:2306.13949 [pdf]

doi 10.1111/biom.13891

Analysis of dynamic restricted mean survival time based on pseudo-observations

Authors: Zi**g Yang, Chengfeng Zhang, Yawen Hou, Zheng Chen

Abstract: In clinical follow-up studies with a time-to-event end point, the difference in the restricted mean survival time (RMST) is a suitable substitute for the hazard ratio (HR). However, the RMST only measures the survival of patients over a period of time from the baseline and cannot reflect changes in life expectancy over time. Based on the RMST, we study the conditional restricted mean survival time… ▽ More In clinical follow-up studies with a time-to-event end point, the difference in the restricted mean survival time (RMST) is a suitable substitute for the hazard ratio (HR). However, the RMST only measures the survival of patients over a period of time from the baseline and cannot reflect changes in life expectancy over time. Based on the RMST, we study the conditional restricted mean survival time (cRMST) by estimating life expectancy in the future according to the time that patients have survived, reflecting the dynamic survival status of patients during follow-up. In this paper, we introduce the estimation method of cRMST based on pseudo-observations, the construction of test statistics according to the difference in the cRMST (cRMSTd), and the establishment of the robust dynamic prediction model using the landmark method. Simulation studies are employed to evaluate the statistical properties of these methods, which are also applied to two real examples. The simulation results show that the estimation of the cRMST is accurate and the cRMSTd test performs well. In addition, the dynamic RMST model has high accuracy in coefficient estimation and better predictive performance than the static RMST model. The hypothesis test proposed in this paper has a wide range of applicability, and the dynamic RMST model can predict patients' life expectancy from any prediction time, considering the time-dependent covariates and time-varying effects of covariates. △ Less

Submitted 24 June, 2023; originally announced June 2023.

Comments: Biometrics. 2023

Report number: 13891

Showing 1–50 of 443 results for author: Zhang, C