Search | arXiv e-print repository

Temporal label recovery from noisy dynamical data

Authors: Yuehaw Khoo, Xin T. Tong, Wanjie Wang, Yuguan Wang

Abstract: Analyzing dynamical data often requires information of the temporal labels, but such information is unavailable in many applications. Recovery of these temporal labels, closely related to the seriation or sequencing problem, becomes crucial in the study. However, challenges arise due to the nonlinear nature of the data and the complexity of the underlying dynamical system, which may be periodic or… ▽ More Analyzing dynamical data often requires information of the temporal labels, but such information is unavailable in many applications. Recovery of these temporal labels, closely related to the seriation or sequencing problem, becomes crucial in the study. However, challenges arise due to the nonlinear nature of the data and the complexity of the underlying dynamical system, which may be periodic or non-periodic. Additionally, noise within the feature space complicates the theoretical analysis. Our work develops spectral algorithms that leverage manifold learning concepts to recover temporal labels from noisy data. We first construct the graph Laplacian of the data, and then employ the second (and the third) Fiedler vectors to recover temporal labels. This method can be applied to both periodic and aperiodic cases. It also does not require monotone properties on the similarity matrix, which are commonly assumed in existing spectral seriation algorithms. We develop the $\ell_{\infty}$ error of our estimators for the temporal labels and ranking, without assumptions on the eigen-gap. In numerical analysis, our method outperforms spectral seriation algorithms based on a similarity matrix. The performance of our algorithms is further demonstrated on a synthetic biomolecule data example. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 20 pages, 4 figures

arXiv:2406.10917 [pdf, other]

Bayesian Intervention Optimization for Causal Discovery

Authors: Yuxuan Wang, Mingzhou Liu, Xinwei Sun, Wei Wang, Yizhou Wang

Abstract: Causal discovery is crucial for understanding complex systems and informing decisions. While observational data can uncover causal relationships under certain assumptions, it often falls short, making active interventions necessary. Current methods, such as Bayesian and graph-theoretical approaches, do not prioritize decision-making and often rely on ideal conditions or information gain, which is… ▽ More Causal discovery is crucial for understanding complex systems and informing decisions. While observational data can uncover causal relationships under certain assumptions, it often falls short, making active interventions necessary. Current methods, such as Bayesian and graph-theoretical approaches, do not prioritize decision-making and often rely on ideal conditions or information gain, which is not directly related to hypothesis testing. We propose a novel Bayesian optimization-based method inspired by Bayes factors that aims to maximize the probability of obtaining decisive and correct evidence. Our approach uses observational data to estimate causal models under different hypotheses, evaluates potential interventions pre-experimentally, and iteratively updates priors to refine interventions. We demonstrate the effectiveness of our method through various experiments. Our contributions provide a robust framework for efficient causal discovery through active interventions, enhancing the practical application of theoretical advancements. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.07536 [pdf, other]

Towards Fundamentally Scalable Model Selection: Asymptotically Fast Update and Selection

Authors: Wenxiao Wang, Weiming Zhuang, Lingjuan Lyu

Abstract: The advancement of deep learning technologies is bringing new models every day, motivating the study of scalable model selection. An ideal model selection scheme should minimally support two operations efficiently over a large pool of candidate models: update, which involves either adding a new candidate model or removing an existing candidate model, and selection, which involves locating highly p… ▽ More The advancement of deep learning technologies is bringing new models every day, motivating the study of scalable model selection. An ideal model selection scheme should minimally support two operations efficiently over a large pool of candidate models: update, which involves either adding a new candidate model or removing an existing candidate model, and selection, which involves locating highly performing models for a given task. However, previous solutions to model selection require high computational complexity for at least one of these two operations. In this work, we target fundamentally (more) scalable model selection that supports asymptotically fast update and asymptotically fast selection at the same time. Firstly, we define isolated model embedding, a family of model selection schemes supporting asymptotically fast update and selection: With respect to the number of candidate models $m$, the update complexity is O(1) and the selection consists of a single sweep over $m$ vectors in addition to O(1) model operations. Isolated model embedding also implies several desirable properties for applications. Secondly, we present Standardized Embedder, an empirical realization of isolated model embedding. We assess its effectiveness by using it to select representations from a pool of 100 pre-trained vision models for classification tasks and measuring the performance gaps between the selected models and the best candidates with a linear probing protocol. Experiments suggest our realization is effective in selecting models with competitive performances and highlight isolated model embedding as a promising direction towards model selection that is fundamentally (more) scalable. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 19 pages, 8 figures

arXiv:2406.06829 [pdf, other]

Personalized Binomial DAGs Learning with Network Structured Covariates

Authors: Boxin Zhao, Weishi Wang, Dingyuan Zhu, Ziqi Liu, Dong Wang, Zhiqiang Zhang, Jun Zhou, Mladen Kolar

Abstract: The causal dependence in data is often characterized by Directed Acyclic Graphical (DAG) models, widely used in many areas. Causal discovery aims to recover the DAG structure using observational data. This paper focuses on causal discovery with multi-variate count data. We are motivated by real-world web visit data, recording individual user visits to multiple websites. Building a causal diagram c… ▽ More The causal dependence in data is often characterized by Directed Acyclic Graphical (DAG) models, widely used in many areas. Causal discovery aims to recover the DAG structure using observational data. This paper focuses on causal discovery with multi-variate count data. We are motivated by real-world web visit data, recording individual user visits to multiple websites. Building a causal diagram can help understand user behavior in transitioning between websites, inspiring operational strategy. A challenge in modeling is user heterogeneity, as users with different backgrounds exhibit varied behaviors. Additionally, social network connections can result in similar behaviors among friends. We introduce personalized Binomial DAG models to address heterogeneity and network dependency between observations, which are common in real-world applications. To learn the proposed DAG model, we develop an algorithm that embeds the network structure into a dimension-reduced covariate, learns each node's neighborhood to reduce the DAG search space, and explores the variance-mean relation to determine the ordering. Simulations show our algorithm outperforms state-of-the-art competitors in heterogeneous data. We demonstrate its practical usefulness on a real-world web visit dataset. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.00503 [pdf, other]

Schrödinger Bridge with Quadratic State Cost is Exactly Solvable

Authors: Alexis M. H. Teter, Wenqing Wang, Abhishek Halder

Abstract: Schrödinger bridge is a diffusion process that steers a given distribution to another in a prescribed time while minimizing the effort to do so. It can be seen as the stochastic dynamical version of the optimal mass transport, and has growing applications in generative diffusion models and stochastic optimal control. In this work, we propose a regularized variant of the Schrödinger bridge with a q… ▽ More Schrödinger bridge is a diffusion process that steers a given distribution to another in a prescribed time while minimizing the effort to do so. It can be seen as the stochastic dynamical version of the optimal mass transport, and has growing applications in generative diffusion models and stochastic optimal control. In this work, we propose a regularized variant of the Schrödinger bridge with a quadratic state cost-to-go that incentivizes the optimal sample paths to stay close to a nominal level. Unlike the conventional Schrödinger bridge, the regularization induces a state-dependent rate of killing and creation of probability mass, and its solution requires determining the Markov kernel of a reaction-diffusion partial differential equation. We derive this Markov kernel in closed form. Our solution recovers the heat kernel in the vanishing regularization (i.e., diffusion without reaction) limit, thereby recovering the solution of the conventional Schrödinger bridge. Our results enable the use of dynamic Sinkhorn recursion for computing the Schrödinger bridge with a quadratic state cost-to-go, which would otherwise be challenging to use in this setting. We deduce properties of the new kernel and explain its connections with certain exactly solvable models in quantum mechanics. △ Less

Submitted 16 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

arXiv:2404.10728 [pdf, other]

Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

Authors: Hao-Lun Hsu, Weixin Wang, Miroslav Pajic, Pan Xu

Abstract: We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL). We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE and CoopTS-LMC, incorporating the perturbed-history exploration (PHE) strategy and the Langevin M… ▽ More We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL). We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE and CoopTS-LMC, incorporating the perturbed-history exploration (PHE) strategy and the Langevin Monte Carlo exploration (LMC) strategy respectively, which are flexible in design and easy to implement in practice. For a special class of parallel MDPs where the transition is (approximately) linear, we theoretically prove that both CoopTS-PHE and CoopTS-LMC achieve a $\widetilde{\mathcal{O}}(d^{3/2}H^2\sqrt{MK})$ regret bound with communication complexity $\widetilde{\mathcal{O}}(dHM^2)$, where $d$ is the feature dimension, $H$ is the horizon length, $M$ is the number of agents, and $K$ is the number of episodes. This is the first theoretical result for randomized exploration in cooperative MARL. We evaluate our proposed method on multiple parallel RL environments, including a deep exploration problem (\textit{i.e.,} $N$-chain), a video game, and a real-world problem in energy systems. Our experimental results support that our framework can achieve better performance, even under conditions of misspecified transition models. Additionally, we establish a connection between our unified framework and the practical application of federated learning. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 80 pages, 14 figures, 1 table. Hao-Lun Hsu and Weixin Wang contributed equally to this work

arXiv:2404.03828 [pdf, other]

Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

Authors: Jerry Yao-Chieh Hu, Pei-Hsuan Chang, Robin Luo, Hong-Yu Chen, Weijian Li, Wei-Po Wang, Han Liu

Abstract: We introduce an Outlier-Efficient Modern Hopfield Model (termed $\mathrm{OutEffHop}$) and use it to address the outlier inefficiency problem of {training} gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating \textit{outlier-efficient} associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an out… ▽ More We introduce an Outlier-Efficient Modern Hopfield Model (termed $\mathrm{OutEffHop}$) and use it to address the outlier inefficiency problem of {training} gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating \textit{outlier-efficient} associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an outlier-efficient attention mechanism (${\rm Softmax}_1$): it is an approximation of the memory retrieval process of $\mathrm{OutEffHop}$. Methodologically, this allows us to introduce novel outlier-efficient Hopfield layers as powerful alternatives to traditional attention mechanisms, with superior post-quantization performance. Theoretically, the Outlier-Efficient Modern Hopfield Model retains and improves the desirable properties of standard modern Hopfield models, including fixed point convergence and exponential storage capacity. Empirically, we demonstrate the efficacy of the proposed model across large-scale transformer-based and Hopfield-based models (including BERT, OPT, ViT, and STanHop-Net), benchmarking against state-of-the-art methods like $\mathtt{Clipped\_Softmax}$ and $\mathtt{Gated\_Attention}$. Notably, $\mathrm{OutEffHop}$ achieves an average reduction of 22+\% in average kurtosis and 26+\% in the maximum infinity norm of model outputs across four models. Code is available at \href{https://github.com/MAGICS-LAB/OutEffHop}{GitHub}; models are on \href{https://huggingface.co/collections/magicslabnu/outeffhop-6610fcede8d2cda23009a98f}{Hugging Face Hub}; future updates are on \href{https://arxiv.longhoe.net/abs/2404.03828}{arXiv}. △ Less

Submitted 26 June, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: Accepted at ICML 2024; v2 updated to camera-ready version; Code available at https://github.com/MAGICS-LAB/OutEffHop; Models are on Hugging Face: https://huggingface.co/collections/magicslabnu/outeffhop-6610fcede8d2cda23009a98f

arXiv:2402.09397 [pdf, other]

On the Assessment of Bootstrap Intervals for Samples of Fixed Size

Authors: Weizhen Wang, Chongxiu Yu, Zhongzhan Zhang

Abstract: A reasonable confidence interval should have a confidence coefficient no less than the given nominal level and a small expected length to reliably and accurately estimate the parameter of interest, and the bootstrap interval is considered to be an efficient interval estimation technique. In this paper, we offer a first attempt at computing the coverage probability and expected length of a parametr… ▽ More A reasonable confidence interval should have a confidence coefficient no less than the given nominal level and a small expected length to reliably and accurately estimate the parameter of interest, and the bootstrap interval is considered to be an efficient interval estimation technique. In this paper, we offer a first attempt at computing the coverage probability and expected length of a parametric or percentile bootstrap interval by exact probabilistic calculation for any fixed sample size. This method is applied to the basic bootstrap intervals for functions of binomial proportions and a normal mean. None of these intervals, however, are found to have a correct confidence coefficient, which leads to illogical conclusions including that the bootstrap interval is narrower than the z-interval when estimating a normal mean. This raises a general question of how to utilize bootstrap intervals appropriately in practice since the sample size is typically fixed. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.00388 [pdf, other]

Cumulative Distribution Function based General Temporal Point Processes

Authors: Maolin Wang, Yu Pan, Zenglin Xu, Ruocheng Guo, Xiangyu Zhao, Wanyu Wang, Yiqi Wang, Zitao Liu, Langming Liu

Abstract: Temporal Point Processes (TPPs) hold a pivotal role in modeling event sequences across diverse domains, including social networking and e-commerce, and have significantly contributed to the advancement of recommendation systems and information retrieval strategies. Through the analysis of events such as user interactions and transactions, TPPs offer valuable insights into behavioral patterns, faci… ▽ More Temporal Point Processes (TPPs) hold a pivotal role in modeling event sequences across diverse domains, including social networking and e-commerce, and have significantly contributed to the advancement of recommendation systems and information retrieval strategies. Through the analysis of events such as user interactions and transactions, TPPs offer valuable insights into behavioral patterns, facilitating the prediction of future trends. However, accurately forecasting future events remains a formidable challenge due to the intricate nature of these patterns. The integration of Neural Networks with TPPs has ushered in the development of advanced deep TPP models. While these models excel at processing complex and nonlinear temporal data, they encounter limitations in modeling intensity functions, grapple with computational complexities in integral computations, and struggle to capture long-range temporal dependencies effectively. In this study, we introduce the CuFun model, representing a novel approach to TPPs that revolves around the Cumulative Distribution Function (CDF). CuFun stands out by uniquely employing a monotonic neural network for CDF representation, utilizing past events as a scaling factor. This innovation significantly bolsters the model's adaptability and precision across a wide range of data scenarios. Our approach addresses several critical issues inherent in traditional TPP modeling: it simplifies log-likelihood calculations, extends applicability beyond predefined density function forms, and adeptly captures long-range temporal patterns. Our contributions encompass the introduction of a pioneering CDF-based TPP model, the development of a methodology for incorporating past event information into future event prediction, and empirical validation of CuFun's effectiveness through extensive experimentation on synthetic and real-world datasets. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2312.01386 [pdf, ps, other]

Regret Optimality of GP-UCB

Authors: Wenjia Wang, Xiaowei Zhang, Lu Zou

Abstract: Gaussian Process Upper Confidence Bound (GP-UCB) is one of the most popular methods for optimizing black-box functions with noisy observations, due to its simple structure and superior performance. Its empirical successes lead to a natural, yet unresolved question: Is GP-UCB regret optimal? In this paper, we offer the first generally affirmative answer to this important open question in the Bayesi… ▽ More Gaussian Process Upper Confidence Bound (GP-UCB) is one of the most popular methods for optimizing black-box functions with noisy observations, due to its simple structure and superior performance. Its empirical successes lead to a natural, yet unresolved question: Is GP-UCB regret optimal? In this paper, we offer the first generally affirmative answer to this important open question in the Bayesian optimization literature. We establish new upper bounds on both the simple and cumulative regret of GP-UCB when the objective function to optimize admits certain smoothness property. These upper bounds match the known minimax lower bounds (up to logarithmic factors independent of the feasible region's dimensionality) for optimizing functions with the same smoothness. Intriguingly, our findings indicate that, with the same level of exploration, GP-UCB can simultaneously achieve optimality in both simple and cumulative regret. The crux of our analysis hinges on a refined uniform error bound for online estimation of functions in reproducing kernel Hilbert spaces. This error bound, which we derive from empirical process theory, is of independent interest, and its potential applications may reach beyond the scope of this study. △ Less

Submitted 3 December, 2023; originally announced December 2023.

Comments: 23 pages

arXiv:2312.01162 [pdf, other]

Tests for Many Treatment Effects in Regression Discontinuity Panel Data Models

Authors: Likai Chen, Georg Keilbar, Liangjun Su, Weining Wang

Abstract: Numerous studies use regression discontinuity design (RDD) for panel data by assuming that the treatment effects are homogeneous across all individuals/groups and pooling the data together. It is unclear how to test for the significance of treatment effects when the treatments vary across individuals/groups and the error terms may exhibit complicated dependence structures. This paper examines the… ▽ More Numerous studies use regression discontinuity design (RDD) for panel data by assuming that the treatment effects are homogeneous across all individuals/groups and pooling the data together. It is unclear how to test for the significance of treatment effects when the treatments vary across individuals/groups and the error terms may exhibit complicated dependence structures. This paper examines the estimation and inference of multiple treatment effects when the errors are not independent and identically distributed, and the treatment effects vary across individuals/groups. We derive a simple analytical expression for approximating the variance-covariance structure of the treatment effect estimators under general dependence conditions and propose two test statistics, one is to test for the overall significance of the treatment effect and the other for the homogeneity of the treatment effects. We find that in the Gaussian approximations to the test statistics, the dependence structures in the data can be safely ignored due to the localized nature of the statistics. This has the important implication that the simulated critical values can be easily obtained. Simulations demonstrate our tests have superb size control and reasonable power performance in finite samples regardless of the presence of strong cross-section dependence or/and weak serial dependence in the data. We apply our tests to two datasets and find significant overall treatment effects in each case. △ Less

Submitted 2 December, 2023; originally announced December 2023.

arXiv:2311.13958 [pdf, other]

Handling The Non-Smooth Challenge in Tensor SVD: A Multi-Objective Tensor Recovery Framework

Authors: **g**g Zheng, Wanglong Lu, Wenzhe Wang, Yankai Cao, Xiaoqin Zhang, Xianta Jiang

Abstract: Recently, numerous tensor singular value decomposition (t-SVD)-based tensor recovery methods have shown promise in processing visual data, such as color images and videos. However, these methods often suffer from severe performance degradation when confronted with tensor data exhibiting non-smooth changes. It has been commonly observed in real-world scenarios but ignored by the traditional t-SVD-b… ▽ More Recently, numerous tensor singular value decomposition (t-SVD)-based tensor recovery methods have shown promise in processing visual data, such as color images and videos. However, these methods often suffer from severe performance degradation when confronted with tensor data exhibiting non-smooth changes. It has been commonly observed in real-world scenarios but ignored by the traditional t-SVD-based methods. In this work, we introduce a novel tensor recovery model with a learnable tensor nuclear norm to address such a challenge. We develop a new optimization algorithm named the Alternating Proximal Multiplier Method (APMM) to iteratively solve the proposed tensor completion model. Theoretical analysis demonstrates the convergence of the proposed APMM to the Karush-Kuhn-Tucker (KKT) point of the optimization problem. In addition, we propose a multi-objective tensor recovery framework based on APMM to efficiently explore the correlations of tensor data across its various dimensions, providing a new perspective on extending the t-SVD-based method to higher-order tensor cases. Numerical experiments demonstrated the effectiveness of the proposed method in tensor completion. △ Less

Submitted 31 March, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

arXiv:2311.04731 [pdf, other]

Robust Best-arm Identification in Linear Bandits

Authors: Wei Wang, Sattar Vakili, Ilija Bogunovic

Abstract: We study the robust best-arm identification problem (RBAI) in the case of linear rewards. The primary objective is to identify a near-optimal robust arm, which involves selecting arms at every round and assessing their robustness by exploring potential adversarial actions. This approach is particularly relevant when utilizing a simulator and seeking to identify a robust solution for real-world tra… ▽ More We study the robust best-arm identification problem (RBAI) in the case of linear rewards. The primary objective is to identify a near-optimal robust arm, which involves selecting arms at every round and assessing their robustness by exploring potential adversarial actions. This approach is particularly relevant when utilizing a simulator and seeking to identify a robust solution for real-world transfer. To this end, we present an instance-dependent lower bound for the robust best-arm identification problem with linear rewards. Furthermore, we propose both static and adaptive bandit algorithms that achieve sample complexity that matches the lower bound. In synthetic experiments, our algorithms effectively identify the best robust arm and perform similarly to the oracle strategy. As an application, we examine diabetes care and the process of learning insulin dose recommendations that are robust with respect to inaccuracies in standard calculators. Our algorithms prove to be effective in identifying robust dosage values across various age ranges of patients. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2310.11311 [pdf, other]

Elucidating The Design Space of Classifier-Guided Diffusion Generation

Authors: Jiajun Ma, Tianyang Hu, Wenjia Wang, Jiacheng Sun

Abstract: Guidance in conditional diffusion generation is of great importance for sample quality and controllability. However, existing guidance schemes are to be desired. On one hand, mainstream methods such as classifier guidance and classifier-free guidance both require extra training with labeled data, which is time-consuming and unable to adapt to new conditions. On the other hand, training-free method… ▽ More Guidance in conditional diffusion generation is of great importance for sample quality and controllability. However, existing guidance schemes are to be desired. On one hand, mainstream methods such as classifier guidance and classifier-free guidance both require extra training with labeled data, which is time-consuming and unable to adapt to new conditions. On the other hand, training-free methods such as universal guidance, though more flexible, have yet to demonstrate comparable performance. In this work, through a comprehensive investigation into the design space, we show that it is possible to achieve significant performance improvements over existing guidance schemes by leveraging off-the-shelf classifiers in a training-free fashion, enjoying the best of both worlds. Employing calibration as a general guideline, we propose several pre-conditioning techniques to better exploit pretrained off-the-shelf classifiers for guiding diffusion generation. Extensive experiments on ImageNet validate our proposed method, showing that state-of-the-art diffusion models (DDPM, EDM, DiT) can be further improved (up to 20%) using off-the-shelf classifiers with barely any extra computational cost. With the proliferation of publicly available pretrained classifiers, our proposed approach has great potential and can be readily scaled up to text-to-image generation tasks. The code is available at https://github.com/AlexMaOLS/EluCD/tree/main. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2309.13482 [pdf, other]

A Unified Scheme of ResNet and Softmax

Authors: Zhao Song, Weixin Wang, Junze Yin

Abstract: Large language models (LLMs) have brought significant changes to human society. Softmax regression and residual neural networks (ResNet) are two important techniques in deep learning: they not only serve as significant theoretical components supporting the functionality of LLMs but also are related to many other machine learning and theoretical computer science fields, including but not limited to… ▽ More Large language models (LLMs) have brought significant changes to human society. Softmax regression and residual neural networks (ResNet) are two important techniques in deep learning: they not only serve as significant theoretical components supporting the functionality of LLMs but also are related to many other machine learning and theoretical computer science fields, including but not limited to image classification, object detection, semantic segmentation, and tensors. Previous research works studied these two concepts separately. In this paper, we provide a theoretical analysis of the regression problem: $\| \langle \exp(Ax) + A x , {\bf 1}_n \rangle^{-1} ( \exp(Ax) + Ax ) - b \|_2^2$, where $A$ is a matrix in $\mathbb{R}^{n \times d}$, $b$ is a vector in $\mathbb{R}^n$, and ${\bf 1}_n$ is the $n$-dimensional vector whose entries are all $1$. This regression problem is a unified scheme that combines softmax regression and ResNet, which has never been done before. We derive the gradient, Hessian, and Lipschitz properties of the loss function. The Hessian is shown to be positive semidefinite, and its structure is characterized as the sum of a low-rank matrix and a diagonal matrix. This enables an efficient approximate Newton method. As a result, this unified scheme helps to connect two previously thought unrelated fields and provides novel insight into loss landscape and optimization for emerging over-parameterized neural networks, which is meaningful for future research in deep learning models. △ Less

Submitted 23 September, 2023; originally announced September 2023.

arXiv:2309.08489 [pdf, other]

Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network

Authors: Yiling Huang, Weiran Wang, Guanlong Zhao, Hank Liao, Wei Xia, Quan Wang

Abstract: While standard speaker diarization attempts to answer the question "who spoken when", most of relevant applications in reality are more interested in determining "who spoken what". Whether it is the conventional modularized approach or the more recent end-to-end neural diarization (EEND), an additional automatic speech recognition (ASR) model and an orchestration algorithm are required to associat… ▽ More While standard speaker diarization attempts to answer the question "who spoken when", most of relevant applications in reality are more interested in determining "who spoken what". Whether it is the conventional modularized approach or the more recent end-to-end neural diarization (EEND), an additional automatic speech recognition (ASR) model and an orchestration algorithm are required to associate the speaker labels with recognized words. In this paper, we propose Word-level End-to-End Neural Diarization (WEEND) with auxiliary network, a multi-task learning algorithm that performs end-to-end ASR and speaker diarization in the same neural architecture. That is, while speech is being recognized, speaker labels are predicted simultaneously for each recognized word. Experimental results demonstrate that WEEND outperforms the turn-based diarization baseline system on all 2-speaker short-form scenarios and has the capability to generalize to audio lengths of 5 minutes. Although 3+speaker conversations are harder, we find that with enough in-domain training data, WEEND has the potential to deliver high quality diarized text. △ Less

Submitted 15 September, 2023; originally announced September 2023.

arXiv:2309.07418 [pdf, other]

A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time

Authors: Yeqi Gao, Zhao Song, Weixin Wang, Junze Yin

Abstract: Large language models (LLMs) have played a pivotal role in revolutionizing various facets of our daily existence. Solving attention regression is a fundamental task in optimizing LLMs. In this work, we focus on giving a provable guarantee for the one-layer attention network objective function… ▽ More Large language models (LLMs) have played a pivotal role in revolutionizing various facets of our daily existence. Solving attention regression is a fundamental task in optimizing LLMs. In this work, we focus on giving a provable guarantee for the one-layer attention network objective function $L(X,Y) = \sum_{j_0 = 1}^n \sum_{i_0 = 1}^d ( \langle \langle \exp( \mathsf{A}_{j_0} x ) , {\bf 1}_n \rangle^{-1} \exp( \mathsf{A}_{j_0} x ), A_{3} Y_{*,i_0} \rangle - b_{j_0,i_0} )^2$. Here $\mathsf{A} \in \mathbb{R}^{n^2 \times d^2}$ is Kronecker product between $A_1 \in \mathbb{R}^{n \times d}$ and $A_2 \in \mathbb{R}^{n \times d}$. $A_3$ is a matrix in $\mathbb{R}^{n \times d}$, $\mathsf{A}_{j_0} \in \mathbb{R}^{n \times d^2}$ is the $j_0$-th block of $\mathsf{A}$. The $X, Y \in \mathbb{R}^{d \times d}$ are variables we want to learn. $B \in \mathbb{R}^{n \times d}$ and $b_{j_0,i_0} \in \mathbb{R}$ is one entry at $j_0$-th row and $i_0$-th column of $B$, $Y_{*,i_0} \in \mathbb{R}^d$ is the $i_0$-column vector of $Y$, and $x \in \mathbb{R}^{d^2}$ is the vectorization of $X$. In a multi-layer LLM network, the matrix $B \in \mathbb{R}^{n \times d}$ can be viewed as the output of a layer, and $A_1= A_2 = A_3 \in \mathbb{R}^{n \times d}$ can be viewed as the input of a layer. The matrix version of $x$ can be viewed as $QK^\top$ and $Y$ can be viewed as $V$. We provide an iterative greedy algorithm to train loss function $L(X,Y)$ up $ε$ that runs in $\widetilde{O}( ({\cal T}_{\mathrm{mat}}(n,n,d) + {\cal T}_{\mathrm{mat}}(n,d,d) + d^{2ω}) \log(1/ε) )$ time. Here ${\cal T}_{\mathrm{mat}}(a,b,c)$ denotes the time of multiplying $a \times b$ matrix another $b \times c$ matrix, and $ω\approx 2.37$ denotes the exponent of matrix multiplication. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2309.04002 [pdf, other]

Total Variation Floodgate for Variable Importance Inference in Classification

Authors: Wenshuo Wang, Lucas Janson, Lihua Lei, Aaditya Ramdas

Abstract: Inferring variable importance is the key problem of many scientific studies, where researchers seek to learn the effect of a feature $X$ on the outcome $Y$ in the presence of confounding variables $Z$. Focusing on classification problems, we define the expected total variation (ETV), which is an intuitive and deterministic measure of variable importance that does not rely on any model context. We… ▽ More Inferring variable importance is the key problem of many scientific studies, where researchers seek to learn the effect of a feature $X$ on the outcome $Y$ in the presence of confounding variables $Z$. Focusing on classification problems, we define the expected total variation (ETV), which is an intuitive and deterministic measure of variable importance that does not rely on any model context. We then introduce algorithms for statistical inference on the ETV under design-based/model-X assumptions. These algorithms build on the floodgate notion for regression problems (Zhang and Janson 2020). The algorithms we introduce can leverage any user-specified regression function and produce asymptotic lower confidence bounds for the ETV. We show the effectiveness of our algorithms with simulations and a case study in conjoint analysis on the US general election. △ Less

Submitted 7 September, 2023; originally announced September 2023.

arXiv:2308.02918 [pdf, other]

Spectral Ranking Inferences based on General Multiway Comparisons

Authors: Jianqing Fan, Zhipeng Lou, Weichen Wang, Mengxin Yu

Abstract: This paper studies the performance of the spectral method in the estimation and uncertainty quantification of the unobserved preference scores of compared entities in a general and more realistic setup. Specifically, the comparison graph consists of hyper-edges of possible heterogeneous sizes, and the number of comparisons can be as low as one for a given hyper-edge. Such a setting is pervasive in… ▽ More This paper studies the performance of the spectral method in the estimation and uncertainty quantification of the unobserved preference scores of compared entities in a general and more realistic setup. Specifically, the comparison graph consists of hyper-edges of possible heterogeneous sizes, and the number of comparisons can be as low as one for a given hyper-edge. Such a setting is pervasive in real applications, circumventing the need to specify the graph randomness and the restrictive homogeneous sampling assumption imposed in the commonly used Bradley-Terry-Luce (BTL) or Plackett-Luce (PL) models. Furthermore, in scenarios where the BTL or PL models are appropriate, we unravel the relationship between the spectral estimator and the Maximum Likelihood Estimator (MLE). We discover that a two-step spectral method, where we apply the optimal weighting estimated from the equal weighting vanilla spectral method, can achieve the same asymptotic efficiency as the MLE. Given the asymptotic distributions of the estimated preference scores, we also introduce a comprehensive framework to carry out both one-sample and two-sample ranking inferences, applicable to both fixed and random graph settings. It is noteworthy that this is the first time effective two-sample rank testing methods have been proposed. Finally, we substantiate our findings via comprehensive numerical simulations and subsequently apply our developed methodologies to perform statistical inferences for statistical journals and movie rankings. △ Less

Submitted 1 March, 2024; v1 submitted 5 August, 2023; originally announced August 2023.

Comments: 62 pages, 4 figures

arXiv:2307.08283 [pdf, other]

Complexity Matters: Rethinking the Latent Space for Generative Modeling

Authors: Tianyang Hu, Fei Chen, Haonan Wang, Jiawei Li, Wenjia Wang, Jiacheng Sun, Zhenguo Li

Abstract: In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion models the latent space induced by an encoder and generates images through a paired decoder. Although the selection of the latent space is empirically pivotal, determining the optimal choice and the process of identifying it remain unclear. In this study, we aim to shed light on t… ▽ More In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion models the latent space induced by an encoder and generates images through a paired decoder. Although the selection of the latent space is empirically pivotal, determining the optimal choice and the process of identifying it remain unclear. In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity. Our investigation starts with the classic generative adversarial networks (GANs). Inspired by the GAN training objective, we propose a novel "distance" between the latent and data distributions, whose minimization coincides with that of the generator complexity. The minimizer of this distance is characterized as the optimal data-dependent latent that most effectively capitalizes on the generator's capacity. Then, we consider parameterizing such a latent distribution by an encoder network and propose a two-stage training strategy called Decoupled Autoencoder (DAE), where the encoder is only updated in the first stage with an auxiliary decoder and then frozen in the second stage while the actual decoder is being trained. DAE can improve the latent distribution and as a result, improve the generative performance. Our theoretical analyses are corroborated by comprehensive experiments on various models such as VQGAN and Diffusion Transformer, where our modifications yield significant improvements in sample quality with decreased model complexity. △ Less

Submitted 29 October, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: Accepted to NeurIPS 2023 (Spotlight)

arXiv:2307.03340 [pdf, other]

Calibrating Car-Following Models via Bayesian Dynamic Regression

Authors: Chengyuan Zhang, Wenshuo Wang, Lijun Sun

Abstract: Car-following behavior modeling is critical for understanding traffic flow dynamics and develo** high-fidelity microscopic simulation models. Most existing impulse-response car-following models prioritize computational efficiency and interpretability by using a parsimonious nonlinear function based on immediate preceding state observations. However, this approach disregards historical informatio… ▽ More Car-following behavior modeling is critical for understanding traffic flow dynamics and develo** high-fidelity microscopic simulation models. Most existing impulse-response car-following models prioritize computational efficiency and interpretability by using a parsimonious nonlinear function based on immediate preceding state observations. However, this approach disregards historical information, limiting its ability to explain real-world driving data. Consequently, serially correlated residuals are commonly observed when calibrating these models with actual trajectory data, hindering their ability to capture complex and stochastic phenomena. To address this limitation, we propose a dynamic regression framework incorporating time series models, such as autoregressive processes, to capture error dynamics. This statistically rigorous calibration outperforms the simple assumption of independent errors and enables more accurate simulation and prediction by leveraging higher-order historical information. We validate the effectiveness of our framework using HighD and OpenACC data, demonstrating improved probabilistic simulations. In summary, our framework preserves the parsimonious nature of traditional car-following models while offering enhanced probabilistic simulations. The code of this work is available at https://github.com/Chengyuan-Zhang/IDM_Bayesian_Calibration. △ Less

Submitted 11 June, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

arXiv:2306.16415 [pdf, other]

On Practical Aspects of Aggregation Defenses against Data Poisoning Attacks

Authors: Wenxiao Wang, Soheil Feizi

Abstract: The increasing access to data poses both opportunities and risks in deep learning, as one can manipulate the behaviors of deep learning models with malicious training samples. Such attacks are known as data poisoning. Recent advances in defense strategies against data poisoning have highlighted the effectiveness of aggregation schemes in achieving state-of-the-art results in certified poisoning ro… ▽ More The increasing access to data poses both opportunities and risks in deep learning, as one can manipulate the behaviors of deep learning models with malicious training samples. Such attacks are known as data poisoning. Recent advances in defense strategies against data poisoning have highlighted the effectiveness of aggregation schemes in achieving state-of-the-art results in certified poisoning robustness. However, the practical implications of these approaches remain unclear. Here we focus on Deep Partition Aggregation, a representative aggregation defense, and assess its practical aspects, including efficiency, performance, and robustness. For evaluations, we use ImageNet resized to a resolution of 64 by 64 to enable evaluations at a larger scale than previous ones. Firstly, we demonstrate a simple yet practical approach to scaling base models, which improves the efficiency of training and inference for aggregation defenses. Secondly, we provide empirical evidence supporting the data-to-complexity ratio, i.e. the ratio between the data set size and sample complexity, as a practical estimation of the maximum number of base models that can be deployed while preserving accuracy. Last but not least, we point out how aggregation defenses boost poisoning robustness empirically through the poisoning overfitting phenomenon, which is the key underlying mechanism for the empirical poisoning robustness of aggregations. Overall, our findings provide valuable insights for practical implementations of aggregation defenses to mitigate the threat of data poisoning. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: 15 pages

arXiv:2306.16091 [pdf, other]

Adaptive functional principal components analysis

Authors: Sunny G. W. Wang, Valentin Patilea, Nicolas Klutchnikoff

Abstract: Functional data analysis almost always involves smoothing discrete observations into curves, because they are never observed in continuous time and rarely without error. Although smoothing parameters affect the subsequent inference, data-driven methods for selecting these parameters are not well-developed, frustrated by the difficulty of using all the information shared by curves while being compu… ▽ More Functional data analysis almost always involves smoothing discrete observations into curves, because they are never observed in continuous time and rarely without error. Although smoothing parameters affect the subsequent inference, data-driven methods for selecting these parameters are not well-developed, frustrated by the difficulty of using all the information shared by curves while being computationally efficient. On the one hand, smoothing individual curves in an isolated, albeit sophisticated way, ignores useful signals present in other curves. On the other hand, bandwidth selection by automatic procedures such as cross-validation after pooling all the curves together quickly become computationally unfeasible due to the large number of data points. In this paper we propose a new data-driven, adaptive kernel smoothing, specifically tailored for functional principal components analysis through the derivation of sharp, explicit risk bounds for the eigen-elements. The minimization of these quadratic risk bounds provide refined, yet computationally efficient bandwidth rules for each eigen-element separately. Both common and independent design cases are allowed. Rates of convergence for the estimators are derived. An extensive simulation study, designed in a versatile manner to closely mimic the characteristics of real data sets supports our methodological contribution. An illustration on a real data application is provided. △ Less

Submitted 16 April, 2024; v1 submitted 28 June, 2023; originally announced June 2023.

MSC Class: 62R10; 62G08; 62M99

arXiv:2306.15616 [pdf, other]

Network-Adjusted Covariates for Community Detection

Authors: Yaofang Hu, Wanjie Wang

Abstract: Community detection is a crucial task in network analysis that can be significantly improved by incorporating subject-level information, i.e. covariates. However, current methods often struggle with selecting tuning parameters and analyzing low-degree nodes. In this paper, we introduce a novel method that addresses these challenges by constructing network-adjusted covariates, which leverage the ne… ▽ More Community detection is a crucial task in network analysis that can be significantly improved by incorporating subject-level information, i.e. covariates. However, current methods often struggle with selecting tuning parameters and analyzing low-degree nodes. In this paper, we introduce a novel method that addresses these challenges by constructing network-adjusted covariates, which leverage the network connections and covariates with a unique weight to each node based on the node's degree. Spectral clustering on network-adjusted covariates yields an exact recovery of community labels under certain conditions, which is tuning-free and computationally efficient. We present novel theoretical results about the strong consistency of our method under degree-corrected stochastic blockmodels with covariates, even in the presence of mis-specification and sparse communities with bounded degrees. Additionally, we establish a general lower bound for the community detection problem when both network and covariates are present, and it shows our method is optimal up to a constant factor. Our method outperforms existing approaches in simulations and a LastFM app user network, and provides interpretable community structures in a statistics publication citation network where $30\%$ of nodes are isolated. △ Less

Submitted 11 February, 2024; v1 submitted 27 June, 2023; originally announced June 2023.

Comments: 48 pages

MSC Class: 91D30; 62F12; 91C20

arXiv:2306.12690 [pdf, other]

Uniform error bound for PCA matrix denoising

Authors: Xin T. Tong, Wanjie Wang, Yuguan Wang

Abstract: Principal component analysis (PCA) is a simple and popular tool for processing high-dimensional data. We investigate its effectiveness for matrix denoising. We consider the clean data are generated from a low-dimensional subspace, but masked by independent high-dimensional sub-Gaussian noises with standard deviation $σ$. Under the low-rank assumption on the clean data with a mild spectral gap as… ▽ More Principal component analysis (PCA) is a simple and popular tool for processing high-dimensional data. We investigate its effectiveness for matrix denoising. We consider the clean data are generated from a low-dimensional subspace, but masked by independent high-dimensional sub-Gaussian noises with standard deviation $σ$. Under the low-rank assumption on the clean data with a mild spectral gap assumption, we prove that the distance between each pair of PCA-denoised data point and the clean data point is uniformly bounded by $O(σ\log n)$. To illustrate the spectral gap assumption, we show it can be satisfied when the clean data are independently generated with a non-degenerate covariance matrix. We then provide a general lower bound for the error of the denoised data matrix, which indicates PCA denoising gives a uniform error bound that is rate-optimal. Furthermore, we examine how the error bound impacts downstream applications such as clustering and manifold learning. Numerical results validate our theoretical findings and reveal the importance of the uniform error. △ Less

Submitted 11 March, 2024; v1 submitted 22 June, 2023; originally announced June 2023.

Comments: 23 pages, 2 figures

MSC Class: 62H25(primary); 62H30; 62R30

arXiv:2306.06726 [pdf, other]

Statistical inference for the penalized EM algorithm to test differential item functioning

Authors: Weimeng Wang, Yang Liu, Jeffrey R. Harring

Abstract: Recent advancements in testing differential item functioning (DIF) have greatly relaxed restrictions made by the conventional multiple group item response theory (IRT) model with respect to the number of grou** variables and the assumption of predefined DIF-free anchor items. The application of the $L_1$ penalty in DIF detection has shown promising results in identifying a DIF item without a pri… ▽ More Recent advancements in testing differential item functioning (DIF) have greatly relaxed restrictions made by the conventional multiple group item response theory (IRT) model with respect to the number of grou** variables and the assumption of predefined DIF-free anchor items. The application of the $L_1$ penalty in DIF detection has shown promising results in identifying a DIF item without a priori knowledge on anchor items while allowing the simultaneous investigation of multiple grou** variables. The least absolute shrinkage and selection operator (LASSO) is added directly to the loss function to encourage variable sparsity such that DIF parameters of anchor items are penalized to be zero. Therefore, no predefined anchor items are needed. However, DIF detection using LASSO requires a non-trivial model selection consistency assumption and is difficult to draw statistical inference. Given the importance of identifying DIF items in test development, this study aims to apply the decorrelated score test to test DIF once the penalized method is used. Unlike the existing regularized DIF method which is unable to test the statistical significance of a DIF item selected by LASSO, the decorrelated score test requires weaker assumptions and is able to provide asymptotically valid inference to test DIF. Additionally, the deccorrelated score function can be used to construct asymptotically unbiased normal and efficient DIF parameter estimates via a one-step correction. The performance of the proposed decorrelated score test and the one-step estimator are evaluated via a Monte Carlo simulation study. △ Less

Submitted 11 June, 2023; originally announced June 2023.

Comments: There are in total 47 pages including the title page, main document, references, figures, and tables. The paper will be presented at the ICSA 2023 at Anna Arbor

arXiv:2306.00196 [pdf, other]

Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption

Authors: Yige Hong, Qiaomin Xie, Yudong Chen, Weina Wang

Abstract: We study the infinite-horizon restless bandit problem with the average reward criterion, in both discrete-time and continuous-time settings. A fundamental goal is to efficiently compute policies that achieve a diminishing optimality gap as the number of arms, $N$, grows large. Existing results on asymptotic optimality all rely on the uniform global attractor property (UGAP), a complex and challeng… ▽ More We study the infinite-horizon restless bandit problem with the average reward criterion, in both discrete-time and continuous-time settings. A fundamental goal is to efficiently compute policies that achieve a diminishing optimality gap as the number of arms, $N$, grows large. Existing results on asymptotic optimality all rely on the uniform global attractor property (UGAP), a complex and challenging-to-verify assumption. In this paper, we propose a general, simulation-based framework, Follow-the-Virtual-Advice, that converts any single-armed policy into a policy for the original $N$-armed problem. This is done by simulating the single-armed policy on each arm and carefully steering the real state towards the simulated state. Our framework can be instantiated to produce a policy with an $O(1/\sqrt{N})$ optimality gap. In the discrete-time setting, our result holds under a simpler synchronization assumption, which covers some problem instances that violate UGAP. More notably, in the continuous-time setting, we do not require \emph{any} additional assumptions beyond the standard unichain condition. In both settings, our work is the first asymptotic optimality result that does not require UGAP. △ Less

Submitted 16 January, 2024; v1 submitted 31 May, 2023; originally announced June 2023.

Comments: NeurIPS 2023. 35 pages, 8 figures

MSC Class: 90C40 ACM Class: G.3; I.6

arXiv:2305.03531 [pdf, other]

Random Smoothing Regularization in Kernel Gradient Descent Learning

Authors: Liang Ding, Tianyang Hu, Jiahang Jiang, Donghao Li, Wenjia Wang, Yuan Yao

Abstract: Random smoothing data augmentation is a unique form of regularization that can prevent overfitting by introducing noise to the input data, encouraging the model to learn more generalized features. Despite its success in various applications, there has been a lack of systematic study on the regularization ability of random smoothing. In this paper, we aim to bridge this gap by presenting a framewor… ▽ More Random smoothing data augmentation is a unique form of regularization that can prevent overfitting by introducing noise to the input data, encouraging the model to learn more generalized features. Despite its success in various applications, there has been a lack of systematic study on the regularization ability of random smoothing. In this paper, we aim to bridge this gap by presenting a framework for random smoothing regularization that can adaptively and effectively learn a wide range of ground truth functions belonging to the classical Sobolev spaces. Specifically, we investigate two underlying function spaces: the Sobolev space of low intrinsic dimension, which includes the Sobolev space in $D$-dimensional Euclidean space or low-dimensional sub-manifolds as special cases, and the mixed smooth Sobolev space with a tensor structure. By using random smoothing regularization as novel convolution-based smoothing kernels, we can attain optimal convergence rates in these cases using a kernel gradient descent algorithm, either with early stop** or weight decay. It is noteworthy that our estimator can adapt to the structural assumptions of the underlying data and avoid the curse of dimensionality. This is achieved through various choices of injected noise distributions such as Gaussian, Laplace, or general polynomial noises, allowing for broad adaptation to the aforementioned structural assumptions of the underlying data. The convergence rate depends only on the effective dimension, which may be significantly smaller than the actual data dimension. We conduct numerical experiments on simulated data to validate our theoretical results. △ Less

Submitted 11 May, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

arXiv:2303.10079 [pdf, other]

doi 10.1007/s11336-023-09936-3

What Can We Learn from a Semiparametric Factor Analysis of Item Responses and Response Time? An Illustration with the PISA 2015 Data

Authors: Yang Liu, Weimeng Wang

Abstract: It is widely believed that a joint factor analysis of item responses and response time (RT) may yield more precise ability scores that are conventionally predicted from responses only. For this purpose, a simple-structure factor model is often preferred as it only requires specifying an additional measurement model for item-level RT while leaving the original item response theory (IRT) model for r… ▽ More It is widely believed that a joint factor analysis of item responses and response time (RT) may yield more precise ability scores that are conventionally predicted from responses only. For this purpose, a simple-structure factor model is often preferred as it only requires specifying an additional measurement model for item-level RT while leaving the original item response theory (IRT) model for responses intact. The added speed factor indicated by item-level RT correlates with the ability factor in the IRT model, allowing RT data to carry additional information about respondents' ability. However, parametric simple-structure factor models are often restrictive and fit poorly to empirical data, which prompts under-confidence in the suitablity of a simple factor structure. In the present paper, we analyze the 2015 Programme for International Student Assessment (PISA) mathematics data using a semiparametric simple-structure model. We conclude that a simple factor structure attains a decent fit after further parametric assumptions in the measurement model are sufficiently relaxed. Furthermore, our semiparametric model implies that the association between latent ability and speed/slowness is strong in the population, but the form of association is nonlinear. It follows that scoring based on the fitted model can substantially improve the precision of ability scores. △ Less

Submitted 30 July, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

arXiv:2302.03684 [pdf, other]

Temporal Robustness against Data Poisoning

Authors: Wenxiao Wang, Soheil Feizi

Abstract: Data poisoning considers cases when an adversary manipulates the behavior of machine learning algorithms through malicious training data. Existing threat models of data poisoning center around a single metric, the number of poisoned samples. In consequence, if attackers can poison more samples than expected with affordable overhead, as in many practical scenarios, they may be able to render existi… ▽ More Data poisoning considers cases when an adversary manipulates the behavior of machine learning algorithms through malicious training data. Existing threat models of data poisoning center around a single metric, the number of poisoned samples. In consequence, if attackers can poison more samples than expected with affordable overhead, as in many practical scenarios, they may be able to render existing defenses ineffective in a short time. To address this issue, we leverage timestamps denoting the birth dates of data, which are often available but neglected in the past. Benefiting from these timestamps, we propose a temporal threat model of data poisoning with two novel metrics, earliness and duration, which respectively measure how long an attack started in advance and how long an attack lasted. Using these metrics, we define the notions of temporal robustness against data poisoning, providing a meaningful sense of protection even with unbounded amounts of poisoned samples when the attacks are temporally bounded. We present a benchmark with an evaluation protocol simulating continuous data collection and periodic deployments of updated models, thus enabling empirical evaluation of temporal robustness. Lastly, we develop and also empirically verify a baseline defense, namely temporal aggregation, offering provable temporal robustness and highlighting the potential of our temporal threat model for data poisoning. △ Less

Submitted 6 December, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2301.10387 [pdf, other]

Mesh-clustered Gaussian process emulator for partial differential equation boundary value problems

Authors: Chih-Li Sung, Wenjia Wang, Liang Ding, Xingjian Wang

Abstract: Partial differential equations (PDEs) have become an essential tool for modeling complex physical systems. Such equations are typically solved numerically via mesh-based methods, such as finite element methods, with solutions over the spatial domain. However, obtaining these solutions are often prohibitively costly, limiting the feasibility of exploring parameters in PDEs. In this paper, we propos… ▽ More Partial differential equations (PDEs) have become an essential tool for modeling complex physical systems. Such equations are typically solved numerically via mesh-based methods, such as finite element methods, with solutions over the spatial domain. However, obtaining these solutions are often prohibitively costly, limiting the feasibility of exploring parameters in PDEs. In this paper, we propose an efficient emulator that simultaneously predicts the solutions over the spatial domain, with theoretical justification of its uncertainty quantification. The novelty of the proposed method lies in the incorporation of the mesh node coordinates into the statistical model. In particular, the proposed method segments the mesh nodes into multiple clusters via a Dirichlet process prior and fits Gaussian process models with the same hyperparameters in each of them. Most importantly, by revealing the underlying clustering structures, the proposed method can provide valuable insights into qualitative features of the resulting dynamics that can be used to guide further investigations. Real examples are demonstrated to show that our proposed method has smaller prediction errors than its main competitors, with competitive computation time, and identifies interesting clusters of mesh nodes that possess physical significance, such as satisfying boundary conditions. An R package for the proposed methodology is provided in an open repository. △ Less

Submitted 14 February, 2024; v1 submitted 24 January, 2023; originally announced January 2023.

arXiv:2301.00092 [pdf, ps, other]

Inference on Time Series Nonparametric Conditional Moment Restrictions Using General Sieves

Authors: Xiaohong Chen, Yuan Liao, Weichen Wang

Abstract: General nonlinear sieve learnings are classes of nonlinear sieves that can approximate nonlinear functions of high dimensional variables much more flexibly than various linear sieves (or series). This paper considers general nonlinear sieve quasi-likelihood ratio (GN-QLR) based inference on expectation functionals of time series data, where the functionals of interest are based on some nonparametr… ▽ More General nonlinear sieve learnings are classes of nonlinear sieves that can approximate nonlinear functions of high dimensional variables much more flexibly than various linear sieves (or series). This paper considers general nonlinear sieve quasi-likelihood ratio (GN-QLR) based inference on expectation functionals of time series data, where the functionals of interest are based on some nonparametric function that satisfy conditional moment restrictions and are learned using multilayer neural networks. While the asymptotic normality of the estimated functionals depends on some unknown Riesz representer of the functional space, we show that the optimally weighted GN-QLR statistic is asymptotically Chi-square distributed, regardless whether the expectation functional is regular (root-$n$ estimable) or not. This holds when the data are weakly dependent beta-mixing condition. We apply our method to the off-policy evaluation in reinforcement learning, by formulating the Bellman equation into the conditional moment restriction framework, so that we can make inference about the state-specific value functional using the proposed GN-QLR method with time series data. In addition, estimating the averaged partial means and averaged partial derivatives of nonparametric instrumental variables and quantile IV models are also presented as leading examples. Finally, a Monte Carlo study shows the finite sample performance of the procedure △ Less

Submitted 2 January, 2023; v1 submitted 30 December, 2022; originally announced January 2023.

arXiv:2212.10701 [pdf, other]

A Non-Asymptotic Analysis of Oversmoothing in Graph Neural Networks

Authors: Xinyi Wu, Zhengdao Chen, William Wang, Ali Jadbabaie

Abstract: Oversmoothing is a central challenge of building more powerful Graph Neural Networks (GNNs). While previous works have only demonstrated that oversmoothing is inevitable when the number of graph convolutions tends to infinity, in this paper, we precisely characterize the mechanism behind the phenomenon via a non-asymptotic analysis. Specifically, we distinguish between two different effects when a… ▽ More Oversmoothing is a central challenge of building more powerful Graph Neural Networks (GNNs). While previous works have only demonstrated that oversmoothing is inevitable when the number of graph convolutions tends to infinity, in this paper, we precisely characterize the mechanism behind the phenomenon via a non-asymptotic analysis. Specifically, we distinguish between two different effects when applying graph convolutions -- an undesirable mixing effect that homogenizes node representations in different classes, and a desirable denoising effect that homogenizes node representations in the same class. By quantifying these two effects on random graphs sampled from the Contextual Stochastic Block Model (CSBM), we show that oversmoothing happens once the mixing effect starts to dominate the denoising effect, and the number of layers required for this transition is $O(\log N/\log (\log N))$ for sufficiently dense graphs with $N$ nodes. We also extend our analysis to study the effects of Personalized PageRank (PPR), or equivalently, the effects of initial residual connections on oversmoothing. Our results suggest that while PPR mitigates oversmoothing at deeper layers, PPR-based architectures still achieve their best performance at a shallow depth and are outperformed by the graph convolution approach on certain graphs. Finally, we support our theoretical results with numerical experiments, which further suggest that the oversmoothing phenomenon observed in practice can be magnified by the difficulty of optimizing deep GNN models. △ Less

Submitted 28 February, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: Accepted by the 11th International Conference on Learning Representations (ICLR 2023)

arXiv:2212.00197 [pdf, other]

GAN-MC: a Variance Reduction Tool for Derivatives Pricing

Authors: Weishi Wang

Abstract: We propose a parameter-free model for estimating the price or valuation of financial derivatives like options, forwards and futures using non-supervised learning networks and Monte Carlo. Although some arbitrage-based pricing formula performs greatly on derivatives pricing like Black-Scholes on option pricing, generative model-based Monte Carlo estimation(GAN-MC) will be more accurate and holds mo… ▽ More We propose a parameter-free model for estimating the price or valuation of financial derivatives like options, forwards and futures using non-supervised learning networks and Monte Carlo. Although some arbitrage-based pricing formula performs greatly on derivatives pricing like Black-Scholes on option pricing, generative model-based Monte Carlo estimation(GAN-MC) will be more accurate and holds more generalizability when lack of training samples on derivatives, underlying asset's price dynamics are unknown or the no-arbitrage conditions can not be solved analytically. We analyze the variance reduction feature of our model and to validate the potential value of the pricing model, we collect real world market derivatives data and show that our model outperforms other arbitrage-based pricing models and non-parametric machine learning models. For comparison, we estimate the price of derivatives using Black-Scholes model, ordinary least squares, radial basis function networks, multilayer perception regression, projection pursuit regression and Monte Carlo only models. △ Less

Submitted 30 November, 2022; originally announced December 2022.

Comments: 19 pages, 2 figures, 7 tables

arXiv:2211.15051 [pdf, other]

Subgroup analysis for the functional linear model

Authors: Yifan Sun, Ziyi Liu, Wu Wang

Abstract: Classical functional linear regression models the relationship between a scalar response and a functional covariate, where the coefficient function is assumed to be identical for all subjects. In this paper, the classical model is extended to allow heterogeneous coefficient functions across different subgroups of subjects. The greatest challenge is that the subgroup structure is usually unknown to… ▽ More Classical functional linear regression models the relationship between a scalar response and a functional covariate, where the coefficient function is assumed to be identical for all subjects. In this paper, the classical model is extended to allow heterogeneous coefficient functions across different subgroups of subjects. The greatest challenge is that the subgroup structure is usually unknown to us. To this end, we develop a penalization-based approach which innovatively applies the penalized fusion technique to simultaneously determine the number and structure of subgroups and coefficient functions within each subgroup. An effective computational algorithm is derived. We also establish the oracle properties and estimation consistency. Extensive numerical simulations demonstrate its superiority compared to several competing methods. The analysis of an air quality dataset leads to interesting findings and improved predictions. △ Less

Submitted 27 November, 2022; originally announced November 2022.

Comments: 24 pages, 9 figures

arXiv:2211.12095 [pdf, other]

Asymptotic Properties of the Synthetic Control Method

Authors: Xiaomeng Zhang, Wendun Wang, Xinyu Zhang

Abstract: This paper provides new insights into the asymptotic properties of the synthetic control method (SCM). We show that the synthetic control (SC) weight converges to a limiting weight that minimizes the mean squared prediction risk of the treatment-effect estimator when the number of pretreatment periods goes to infinity, and we also quantify the rate of convergence. Observing the link between the SC… ▽ More This paper provides new insights into the asymptotic properties of the synthetic control method (SCM). We show that the synthetic control (SC) weight converges to a limiting weight that minimizes the mean squared prediction risk of the treatment-effect estimator when the number of pretreatment periods goes to infinity, and we also quantify the rate of convergence. Observing the link between the SCM and model averaging, we further establish the asymptotic optimality of the SC estimator under imperfect pretreatment fit, in the sense that it achieves the lowest possible squared prediction error among all possible treatment effect estimators that are based on an average of control units, such as matching, inverse probability weighting and difference-in-differences. The asymptotic optimality holds regardless of whether the number of control units is fixed or divergent. Thus, our results provide justifications for the SCM in a wide range of applications. The theoretical results are verified via simulations. △ Less

Submitted 22 November, 2022; originally announced November 2022.

arXiv:2211.11957 [pdf, other]

Ranking Inferences Based on the Top Choice of Multiway Comparisons

Authors: Jianqing Fan, Zhipeng Lou, Weichen Wang, Mengxin Yu

Abstract: This paper considers ranking inference of $n$ items based on the observed data on the top choice among $M$ randomly selected items at each trial. This is a useful modification of the Plackett-Luce model for $M$-way ranking with only the top choice observed and is an extension of the celebrated Bradley-Terry-Luce model that corresponds to $M=2$. Under a uniform sampling scheme in which any $M$ dist… ▽ More This paper considers ranking inference of $n$ items based on the observed data on the top choice among $M$ randomly selected items at each trial. This is a useful modification of the Plackett-Luce model for $M$-way ranking with only the top choice observed and is an extension of the celebrated Bradley-Terry-Luce model that corresponds to $M=2$. Under a uniform sampling scheme in which any $M$ distinguished items are selected for comparisons with probability $p$ and the selected $M$ items are compared $L$ times with multinomial outcomes, we establish the statistical rates of convergence for underlying $n$ preference scores using both $\ell_2$-norm and $\ell_\infty$-norm, with the minimum sampling complexity. In addition, we establish the asymptotic normality of the maximum likelihood estimator that allows us to construct confidence intervals for the underlying scores. Furthermore, we propose a novel inference framework for ranking items through a sophisticated maximum pairwise difference statistic whose distribution is estimated via a valid Gaussian multiplier bootstrap. The estimated distribution is then used to construct simultaneous confidence intervals for the differences in the preference scores and the ranks of individual items. They also enable us to address various inference questions on the ranks of these items. Extensive simulation studies lend further support to our theoretical results. A real data application illustrates the usefulness of the proposed methods convincingly. △ Less

Submitted 5 January, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: In this paper, we build simultaneous confidence intervals for ranks through multiway comparisons

arXiv:2211.01547 [pdf, other]

A Systematic Paradigm for Detecting, Surfacing, and Characterizing Heterogeneous Treatment Effects (HTE)

Authors: John Cai, Weinan Wang

Abstract: To effectively optimize and personalize treatments, it is necessary to investigate the heterogeneity of treatment effects. With the wide range of users being treated over many online controlled experiments, the typical approach of manually investigating each dimension of heterogeneity becomes overly cumbersome and prone to subjective human biases. We need an efficient way to search through thousan… ▽ More To effectively optimize and personalize treatments, it is necessary to investigate the heterogeneity of treatment effects. With the wide range of users being treated over many online controlled experiments, the typical approach of manually investigating each dimension of heterogeneity becomes overly cumbersome and prone to subjective human biases. We need an efficient way to search through thousands of experiments with hundreds of target covariates and hundreds of breakdown dimensions. In this paper, we propose a systematic paradigm for detecting, surfacing and characterizing heterogeneous treatment effects. First, we detect if treatment effect variation is present in an experiment, prior to specifying any breakdowns. Second, we surface the most relevant dimensions for heterogeneity. Finally, we characterize the heterogeneity beyond just the conditional average treatment effects (CATE) by studying the conditional distributions of the estimated individual treatment effects. We show the effectiveness of our methods using simulated data and empirical studies. △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: 6 pages, 6 figures

Journal ref: 2022 Conference on Digital Experimentation

arXiv:2211.00268 [pdf, other]

Stacking designs: designing multi-fidelity computer experiments with target predictive accuracy

Authors: Chih-Li Sung, Yi Ji, Simon Mak, Wenjia Wang, Tao Tang

Abstract: In an era where scientific experiments can be very costly, multi-fidelity emulators provide a useful tool for cost-efficient predictive scientific computing. For scientific applications, the experimenter is often limited by a tight computational budget, and thus wishes to (i) maximize predictive power of the multi-fidelity emulator via a careful design of experiments, and (ii) ensure this model ac… ▽ More In an era where scientific experiments can be very costly, multi-fidelity emulators provide a useful tool for cost-efficient predictive scientific computing. For scientific applications, the experimenter is often limited by a tight computational budget, and thus wishes to (i) maximize predictive power of the multi-fidelity emulator via a careful design of experiments, and (ii) ensure this model achieves a desired error tolerance with some notion of confidence. Existing design methods, however, do not jointly tackle objectives (i) and (ii). We propose a novel stacking design approach that addresses both goals. A multi-level reproducing kernel Hilbert space (RKHS) interpolator is first introduced to build the emulator, under which our stacking design provides a sequential approach for designing multi-fidelity runs such that a desired prediction error of $ε> 0$ is met under regularity assumptions. We then prove a novel cost complexity theorem that, under this multi-level interpolator, establishes a bound on the computation cost (for training data simulation) needed to achieve a prediction bound of $ε$. This result provides novel insights on conditions under which the proposed multi-fidelity approach improves upon a conventional RKHS interpolator which relies on a single fidelity level. Finally, we demonstrate the effectiveness of stacking designs in a suite of simulation experiments and an application to finite element analysis. △ Less

Submitted 27 October, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

arXiv:2210.03728 [pdf, other]

Dynamic Latent Separation for Deep Learning

Authors: Yi-Lin Tuan, Zih-Yun Chiu, William Yang Wang

Abstract: A core problem in machine learning is to learn expressive latent variables for model prediction on complex data that involves multiple sub-components in a flexible and interpretable fashion. Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications. The key idea is to dynamically distance data samples in the latent sp… ▽ More A core problem in machine learning is to learn expressive latent variables for model prediction on complex data that involves multiple sub-components in a flexible and interpretable fashion. Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications. The key idea is to dynamically distance data samples in the latent space and thus enhance the output diversity. Our dynamic latent separation method, inspired by atomic physics, relies on the jointly learned structures of each data sample, which also reveal the importance of each sub-component for distinguishing data samples. This approach, atom modeling, requires no supervision of the latent space and allows us to learn extra partially interpretable representations besides the original goal of a model. We empirically demonstrate that the algorithm also enhances the performance of small to larger-scale models in various classification and generation problems. △ Less

Submitted 11 February, 2024; v1 submitted 7 October, 2022; originally announced October 2022.

arXiv:2209.08729 [pdf]

Data-driven and machine-learning based prediction of wave propagation behavior in dam-break flood

Authors: Changli Li, Zheng Han, Yange Li, Ming Li, Weidong Wang

Abstract: The computational prediction of wave propagation in dam-break floods is a long-standing problem in hydrodynamics and hydrology. Until now, conventional numerical models based on Saint-Venant equations are the dominant approaches. Here we show that a machine learning model that is well-trained on a minimal amount of data, can help predict the long-term dynamic behavior of a one-dimensional dam-brea… ▽ More The computational prediction of wave propagation in dam-break floods is a long-standing problem in hydrodynamics and hydrology. Until now, conventional numerical models based on Saint-Venant equations are the dominant approaches. Here we show that a machine learning model that is well-trained on a minimal amount of data, can help predict the long-term dynamic behavior of a one-dimensional dam-break flood with satisfactory accuracy. For this purpose, we solve the Saint-Venant equations for a one-dimensional dam-break flood scenario using the Lax-Wendroff numerical scheme and train the reservoir computing echo state network (RC-ESN) with the dataset by the simulation results consisting of time-sequence flow depths. We demonstrate a good prediction ability of the RC-ESN model, which ahead predicts wave propagation behavior 286 time-steps in the dam-break flood with a root mean square error (RMSE) smaller than 0.01, outperforming the conventional long short-term memory (LSTM) model which reaches a comparable RMSE of only 81 time-steps ahead. To show the performance of the RC-ESN model, we also provide a sensitivity analysis of the prediction accuracy concerning the key parameters including training set size, reservoir size, and spectral radius. Results indicate that the RC-ESN are less dependent on the training set size, a medium reservoir size K=1200~2600 is sufficient. We confirm that the spectral radius \r{ho} shows a complex influence on the prediction accuracy and suggest a smaller spectral radius \r{ho} currently. By changing the initial flow depth of the dam break, we also obtained the conclusion that the prediction horizon of RC-ESN is larger than that of LSTM. △ Less

Submitted 18 September, 2022; originally announced September 2022.

arXiv:2209.05788 [pdf, other]

Empirical Bayes Multistage Testing for Large-Scale Experiments

Authors: Hui Xu, Weinan Wang

Abstract: Modern application of A/B tests is challenging due to its large scale in various dimensions, which demands flexibility to deal with multiple testing sequentially. The state-of-the-art practice first reduces the observed data stream to always-valid p-values, and then chooses a cut-off as in conventional multiple testing schemes. Here we propose an alternative method called AMSET (adaptive multistag… ▽ More Modern application of A/B tests is challenging due to its large scale in various dimensions, which demands flexibility to deal with multiple testing sequentially. The state-of-the-art practice first reduces the observed data stream to always-valid p-values, and then chooses a cut-off as in conventional multiple testing schemes. Here we propose an alternative method called AMSET (adaptive multistage empirical Bayes test) by incorporating historical data in decision-making to achieve efficiency gains while retaining marginal false discovery rate (mFDR) control that is immune to peeking. We also show that a fully data-driven estimation in AMSET performs robustly to various simulation and real data settings at a large mobile app social network company. △ Less

Submitted 13 September, 2022; originally announced September 2022.

arXiv:2208.13074 [pdf, other]

$\ell^2$ Inference for Change Points in High-Dimensional Time Series via a Two-Way MOSUM

Authors: Jiaqi Li, Likai Chen, Weining Wang, Wei Biao Wu

Abstract: We propose an inference method for detecting multiple change points in high-dimensional time series, targeting dense or spatially clustered signals. Our method aggregates moving sum (MOSUM) statistics cross-sectionally by an $\ell^2$-norm and maximizes them over time. We further introduce a novel Two-Way MOSUM, which utilizes spatial-temporal moving regions to search for breaks, with the added adv… ▽ More We propose an inference method for detecting multiple change points in high-dimensional time series, targeting dense or spatially clustered signals. Our method aggregates moving sum (MOSUM) statistics cross-sectionally by an $\ell^2$-norm and maximizes them over time. We further introduce a novel Two-Way MOSUM, which utilizes spatial-temporal moving regions to search for breaks, with the added advantage of enhancing testing power when breaks occur in only a few groups. The limiting distribution of an $\ell^2$-aggregated statistic is established for testing break existence by extending a high-dimensional Gaussian approximation theorem to spatial-temporal non-stationary processes. Simulation studies exhibit promising performance of our test in detecting non-sparse weak signals. Two applications, analyzing equity returns and COVID-19 cases in the United States, showcase the real-world relevance of our proposed algorithms. △ Less

Submitted 3 July, 2023; v1 submitted 27 August, 2022; originally announced August 2022.

Comments: 111 pages, 10 figures

arXiv:2208.10910 [pdf, other]

A flexible empirical Bayes approach to multiple linear regression and connections with penalized regression

Authors: Youngseok Kim, Wei Wang, Peter Carbonetto, Matthew Stephens

Abstract: We introduce a new empirical Bayes approach for large-scale multiple linear regression. Our approach combines two key ideas: (i) the use of flexible "adaptive shrinkage" priors, which approximate the nonparametric family of scale mixture of normal distributions by a finite mixture of normal distributions; and (ii) the use of variational approximations to efficiently estimate prior hyperparameters… ▽ More We introduce a new empirical Bayes approach for large-scale multiple linear regression. Our approach combines two key ideas: (i) the use of flexible "adaptive shrinkage" priors, which approximate the nonparametric family of scale mixture of normal distributions by a finite mixture of normal distributions; and (ii) the use of variational approximations to efficiently estimate prior hyperparameters and compute approximate posteriors. Combining these two ideas results in fast and flexible methods, with computational speed comparable to fast penalized regression methods such as the Lasso, and with competitive prediction accuracy across a wide range of scenarios. Further, we provide new results that establish conceptual connections between our empirical Bayes methods and penalized methods. Specifically, we show that the posterior mean from our method solves a penalized regression problem, with the form of the penalty function being learned from the data by directly solving an optimization problem (rather than being tuned by cross-validation). Our methods are implemented in an R package, mr.ash.alpha, available from https://github.com/stephenslab/mr.ash.alpha. △ Less

Submitted 12 June, 2024; v1 submitted 23 August, 2022; originally announced August 2022.

arXiv:2208.03309 [pdf, other]

Lethal Dose Conjecture on Data Poisoning

Authors: Wenxiao Wang, Alexander Levine, Soheil Feizi

Abstract: Data poisoning considers an adversary that distorts the training set of machine learning algorithms for malicious purposes. In this work, we bring to light one conjecture regarding the fundamentals of data poisoning, which we call the Lethal Dose Conjecture. The conjecture states: If $n$ clean training samples are needed for accurate predictions, then in a size-$N$ training set, only $Θ(N/n)$ pois… ▽ More Data poisoning considers an adversary that distorts the training set of machine learning algorithms for malicious purposes. In this work, we bring to light one conjecture regarding the fundamentals of data poisoning, which we call the Lethal Dose Conjecture. The conjecture states: If $n$ clean training samples are needed for accurate predictions, then in a size-$N$ training set, only $Θ(N/n)$ poisoned samples can be tolerated while ensuring accuracy. Theoretically, we verify this conjecture in multiple cases. We also offer a more general perspective of this conjecture through distribution discrimination. Deep Partition Aggregation (DPA) and its extension, Finite Aggregation (FA) are recent approaches for provable defenses against data poisoning, where they predict through the majority vote of many base models trained from different subsets of training set using a given learner. The conjecture implies that both DPA and FA are (asymptotically) optimal -- if we have the most data-efficient learner, they can turn it into one of the most robust defenses against data poisoning. This outlines a practical approach to develo** stronger defenses against poisoning via finding data-efficient learners. Empirically, as a proof of concept, we show that by simply using different data augmentations for base learners, we can respectively double and triple the certified robustness of DPA on CIFAR-10 and GTSRB without sacrificing accuracy. △ Less

Submitted 18 October, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

arXiv:2208.00257

Covariate-Assisted Community Detection on Sparse Networks

Authors: Yaofang Hu, Wanjie Wang

Abstract: Community detection is an important problem when processing network data. Traditionally, this is done by exploiting the connections between nodes, but connections can be too sparse to detect communities in many real datasets. Node covariates can be used to assist community detection; see Binkiewicz et al. (2017); Weng and Feng (2022); Yan and Sarkar (2021); Yang et al. (2013). However, how to comb… ▽ More Community detection is an important problem when processing network data. Traditionally, this is done by exploiting the connections between nodes, but connections can be too sparse to detect communities in many real datasets. Node covariates can be used to assist community detection; see Binkiewicz et al. (2017); Weng and Feng (2022); Yan and Sarkar (2021); Yang et al. (2013). However, how to combine covariates with network connections is challenging, because covariates may be high-dimensional and inconsistent with community labels. To study the relationship between covariates and communities, we propose the degree corrected stochastic block model with node covariates (DCSBM-NC). It allows degree heterogeneity among communities and inconsistent labels between communities and covariates. Based on DCSBM-NC, we design the adjusted neighbor-covariate (ANC) data matrix, which leverages covariate information to assist community detection. We then propose the covariate-assisted spectral clustering on ratios of singular vectors (CA-SCORE) method on the ANC matrix. We prove that CA-SCORE successfully recovers community labels when 1) the network is relatively dense; 2) the covariate class labels match the community labels; 3) the data is a mixture of 1) and 2). CA-SCORE has good performance on synthetic and real datasets. The algorithm is implemented in the R(R Core Team (2021)) package CASCORE. △ Less

Submitted 27 June, 2023; v1 submitted 30 July, 2022; originally announced August 2022.

Comments: The theory and algorithm are developed very differently, and so a new submission is in 2306.15616

MSC Class: 91D30; 62F12; 91C20

arXiv:2208.00048 [pdf, other]

Exponential canonical correlation analysis with orthogonal variation

Authors: Dongbang Yuan, Yunfeng Zhang, Shuai Guo, Wenyi Wang, Irina Gaynanova

Abstract: Canonical correlation analysis (CCA) is a standard tool for studying associations between two data sources; however, it is not designed for data with count or proportion measurement types. In addition, while CCA uncovers common signals, it does not elucidate which signals are unique to each data source. To address these challenges, we propose a new framework for CCA based on exponential families w… ▽ More Canonical correlation analysis (CCA) is a standard tool for studying associations between two data sources; however, it is not designed for data with count or proportion measurement types. In addition, while CCA uncovers common signals, it does not elucidate which signals are unique to each data source. To address these challenges, we propose a new framework for CCA based on exponential families with explicit modeling of both common and source-specific signals. Unlike previous methods based on exponential families, the common signals from our model coincide with canonical variables in Gaussian CCA, and the unique signals are exactly orthogonal. These modeling differences lead to a non-trivial estimation via optimization with orthogonality constraints, for which we develop an iterative algorithm based on a splitting method. Simulations show on par or superior performance of the proposed method compared to the available alternatives. We apply the method to analyze associations between gene expressions and lipids concentrations in nutrigenomic study, and to analyze associations between two distinct cell-type deconvolution methods in prostate cancer tumor heterogeneity study. △ Less

Submitted 29 July, 2022; originally announced August 2022.

arXiv:2207.14445 [pdf]

Kendall's Tau for Two-Sample Inference Problems

Authors: Yi-Cheng Tai, Wei**g Wang, Martin T. Wells, National Yang Ming Chiao Tung U., Cornell U

Abstract: We consider a Kendall's tau measure between a binary group indicator and the continuous variable under investigation to develop a thorough two-sample comparison procedure. The measure serves as a useful alternative to the hazard ratio whose applicability depends on the proportional hazards assumption. For right censored data, we propose a weighted log-rank statistic with weights adapted to the cen… ▽ More We consider a Kendall's tau measure between a binary group indicator and the continuous variable under investigation to develop a thorough two-sample comparison procedure. The measure serves as a useful alternative to the hazard ratio whose applicability depends on the proportional hazards assumption. For right censored data, we propose a weighted log-rank statistic with weights adapted to the censoring distributions and develop theoretical properties of the derived estimators. In absence of censoring, the proposed estimator reduces to the WMW statistic. The proposed methodology is applied to analyze several data examples. △ Less

Submitted 28 July, 2022; originally announced July 2022.

Comments: 66 pages, 4 figures in the main text and 4 figures in the supporting information

arXiv:2206.04733 [pdf, other]

On Low-Complexity Quickest Intervention of Mutated Diffusion Processes Through Local Approximation

Authors: Qining Zhang, Honghao Wei, Weina Wang, Lei Ying

Abstract: We consider the problem of controlling a mutated diffusion process with an unknown mutation time. The problem is formulated as the quickest intervention problem with the mutation modeled by a change-point, which is a generalization of the quickest change-point detection (QCD). Our goal is to intervene in the mutated process as soon as possible while maintaining a low intervention cost with optimal… ▽ More We consider the problem of controlling a mutated diffusion process with an unknown mutation time. The problem is formulated as the quickest intervention problem with the mutation modeled by a change-point, which is a generalization of the quickest change-point detection (QCD). Our goal is to intervene in the mutated process as soon as possible while maintaining a low intervention cost with optimally chosen intervention actions. This model and the proposed algorithms can be applied to pandemic prevention (such as Covid-19) or misinformation containment. We formulate the problem as a partially observed Markov decision process (POMDP) and convert it to an MDP through the belief state of the change-point. We first propose a grid approximation approach to calculate the optimal intervention policy, whose computational complexity could be very high when the number of grids is large. In order to reduce the computational complexity, we further propose a low-complexity threshold-based policy through the analysis of the first-order approximation of the value functions in the ``local intervention'' regime. Simulation results show the low-complexity algorithm has a similar performance as the grid approximation and both perform much better than the QCD-based algorithms. △ Less

Submitted 9 June, 2022; originally announced June 2022.

arXiv:2202.02628 [pdf, other]

Improved Certified Defenses against Data Poisoning with (Deterministic) Finite Aggregation

Authors: Wenxiao Wang, Alexander Levine, Soheil Feizi

Abstract: Data poisoning attacks aim at manipulating model behaviors through distorting training data. Previously, an aggregation-based certified defense, Deep Partition Aggregation (DPA), was proposed to mitigate this threat. DPA predicts through an aggregation of base classifiers trained on disjoint subsets of data, thus restricting its sensitivity to dataset distortions. In this work, we propose an impro… ▽ More Data poisoning attacks aim at manipulating model behaviors through distorting training data. Previously, an aggregation-based certified defense, Deep Partition Aggregation (DPA), was proposed to mitigate this threat. DPA predicts through an aggregation of base classifiers trained on disjoint subsets of data, thus restricting its sensitivity to dataset distortions. In this work, we propose an improved certified defense against general poisoning attacks, namely Finite Aggregation. In contrast to DPA, which directly splits the training set into disjoint subsets, our method first splits the training set into smaller disjoint subsets and then combines duplicates of them to build larger (but not disjoint) subsets for training base classifiers. This reduces the worst-case impacts of poison samples and thus improves certified robustness bounds. In addition, we offer an alternative view of our method, bridging the designs of deterministic and stochastic aggregation-based certified defenses. Empirically, our proposed Finite Aggregation consistently improves certificates on MNIST, CIFAR-10, and GTSRB, boosting certified fractions by up to 3.05%, 3.87% and 4.77%, respectively, while kee** the same clean accuracies as DPA's, effectively establishing a new state of the art in (pointwise) certified robustness against data poisoning. △ Less

Submitted 14 July, 2022; v1 submitted 5 February, 2022; originally announced February 2022.

Comments: International Conference on Machine Learning (ICML), 2022

Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:22769-22783, 2022

Showing 1–50 of 274 results for author: Wang, W