-
Consistent community detection in multi-layer networks with heterogeneous differential privacy
Authors:
Yaoming Zhen,
Shirong Xu,
Junhui Wang
Abstract:
As network data has become increasingly prevalent, a substantial amount of attention has been paid to the privacy issue in publishing network data. One of the critical challenges for data publishers is to preserve the topological structures of the original network while protecting sensitive information. In this paper, we propose a personalized edge flip** mechanism that allows data publishers to…
▽ More
As network data has become increasingly prevalent, a substantial amount of attention has been paid to the privacy issue in publishing network data. One of the critical challenges for data publishers is to preserve the topological structures of the original network while protecting sensitive information. In this paper, we propose a personalized edge flip** mechanism that allows data publishers to protect edge information based on each node's privacy preference. It can achieve differential privacy while preserving the community structure under the multi-layer degree-corrected stochastic block model after appropriately debiasing, and thus consistent community detection in the privatized multi-layer networks is achievable. Theoretically, we establish the consistency of community detection in the privatized multi-layer network and show that better privacy protection of edges can be obtained for a proportion of nodes while allowing other nodes to give up their privacy. Furthermore, the advantage of the proposed personalized edge-flip** mechanism is also supported by its numerical performance on various synthetic networks and a real-life multi-layer network.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Identifying Genetic Variants for Obesity Incorporating Prior Insights: Quantile Regression with Insight Fusion for Ultra-high Dimensional Data
Authors:
Jiantong Wang,
Heng Lian,
Yan Yu,
He** Zhang
Abstract:
Obesity is widely recognized as a critical and pervasive health concern. We strive to identify important genetic risk factors from hundreds of thousands of single nucleotide polymorphisms (SNPs) for obesity. We propose and apply a novel Quantile Regression with Insight Fusion (QRIF) approach that can integrate insights from established studies or domain knowledge to simultaneously select variables…
▽ More
Obesity is widely recognized as a critical and pervasive health concern. We strive to identify important genetic risk factors from hundreds of thousands of single nucleotide polymorphisms (SNPs) for obesity. We propose and apply a novel Quantile Regression with Insight Fusion (QRIF) approach that can integrate insights from established studies or domain knowledge to simultaneously select variables and modeling for ultra-high dimensional genetic data, focusing on high conditional quantiles of body mass index (BMI) that are of most interest. We discover interesting new SNPs and shed new light on a comprehensive view of the underlying genetic risk factors for different levels of BMI. This may potentially pave the way for more precise and targeted treatment strategies. The QRIF approach intends to balance the trade-off between the prior insights and the observed data while being robust to potential false information. We further establish the desirable asymptotic properties under the challenging non-differentiable check loss functions via Huber loss approximation and nonconvex SCAD penalty via local linear approximation. Finally, we develop an efficient algorithm for the QRIF approach. Our simulation studies further demonstrate its effectiveness.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Data Shapley in One Training Run
Authors:
Jiachen T. Wang,
Prateek Mittal,
Dawn Song,
Ruoxi Jia
Abstract:
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts. However, existing approaches require re-training models on different data subsets, which is computationally intensive, foreclosing their application to large-scale models. Furthermore, they produce the same attribution score for any models produced by running the learning algorithm, m…
▽ More
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts. However, existing approaches require re-training models on different data subsets, which is computationally intensive, foreclosing their application to large-scale models. Furthermore, they produce the same attribution score for any models produced by running the learning algorithm, meaning they cannot perform targeted attribution towards a specific model obtained from a single run of the algorithm. This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest. In its most efficient implementation, our technique incurs negligible additional runtime compared to standard model training. This dramatic efficiency improvement makes it possible to perform data attribution for the foundation model pretraining stage for the first time. We present several case studies that offer fresh insights into pretraining data's contribution and discuss their implications for copyright in generative AI and pretraining data curation.
△ Less
Submitted 29 June, 2024; v1 submitted 16 June, 2024;
originally announced June 2024.
-
Measure This, Not That: Optimizing the Cost and Model-Based Information Content of Measurements
Authors:
Jialu Wang,
Zedong Peng,
Ryan Hughes,
Debangsu Bhattacharyya,
David E. Bernal Neira,
Alexander W. Dowling
Abstract:
Model-based design of experiments (MBDoE) is a powerful framework for selecting and calibrating science-based mathematical models from data. This work extends popular MBDoE workflows by proposing a convex mixed integer (non)linear programming (MINLP) problem to optimize the selection of measurements. The solver MindtPy is modified to support calculating the D-optimality objective and its gradient…
▽ More
Model-based design of experiments (MBDoE) is a powerful framework for selecting and calibrating science-based mathematical models from data. This work extends popular MBDoE workflows by proposing a convex mixed integer (non)linear programming (MINLP) problem to optimize the selection of measurements. The solver MindtPy is modified to support calculating the D-optimality objective and its gradient via an external package, \texttt{SciPy}, using the grey-box module in Pyomo. The new approach is demonstrated in two case studies: estimating highly correlated kinetics from a batch reactor and estimating transport parameters in a large-scale rotary packed bed for CO$_2$ capture. Both case studies show how examining the Pareto-optimal trade-offs between information content measured by A- and D-optimality versus measurement budget offers practical guidance for selecting measurements for scientific experiments.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Sensitivity Analysis for the Test-Negative Design
Authors:
Soumyabrata Kundu,
Peng Ding,
Xinran Li,
**gshu Wang
Abstract:
The test-negative design has become popular for evaluating the effectiveness of post-licensure vaccines using observational data. In addition to its logistical convenience on data collection, the design is also believed to control for the differential health-care-seeking behavior between vaccinated and unvaccinated individuals, which is an important while often unmeasured confounder between the va…
▽ More
The test-negative design has become popular for evaluating the effectiveness of post-licensure vaccines using observational data. In addition to its logistical convenience on data collection, the design is also believed to control for the differential health-care-seeking behavior between vaccinated and unvaccinated individuals, which is an important while often unmeasured confounder between the vaccination and infection. Hence, the design has been employed routinely to monitor seasonal flu vaccines and more recently to measure the COVID-19 vaccine effectiveness. Despite its popularity, the design has been questioned, in particular about its ability to fully control for the unmeasured confounding. In this paper, we explore deviations from a perfect test-negative design, and propose various sensitivity analysis methods for estimating the effect of vaccination measured by the causal odds ratio on the subpopulation of individuals with good health-care-seeking behavior. We start with point identification of the causal odds ratio under a test-negative design, considering two forms of assumptions on the unmeasured confounder. These assumptions then lead to two approaches for conducting sensitivity analysis, addressing the influence of the unmeasured confounding in different ways. Specifically, one approach investigates partial control for unmeasured confounder in the test-negative design, while the other examines the impact of unmeasured confounder on both vaccination and infection. Furthermore, these approaches can be combined to provide narrower bounds on the true causal odds ratio, and can be further extended to sharpen the bounds by restricting the treatment effect heterogeneity. Finally, we apply the proposed methods to evaluate the effectiveness of COVID-19 vaccines using observational data from test-negative designs.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Hardness of Learning Neural Networks under the Manifold Hypothesis
Authors:
Bobak T. Kiani,
Jason Wang,
Melanie Weber
Abstract:
The manifold hypothesis presumes that high-dimensional data lies on or near a low-dimensional manifold. While the utility of encoding geometric structure has been demonstrated empirically, rigorous analysis of its impact on the learnability of neural networks is largely missing. Several recent results have established hardness results for learning feedforward and equivariant neural networks under…
▽ More
The manifold hypothesis presumes that high-dimensional data lies on or near a low-dimensional manifold. While the utility of encoding geometric structure has been demonstrated empirically, rigorous analysis of its impact on the learnability of neural networks is largely missing. Several recent results have established hardness results for learning feedforward and equivariant neural networks under i.i.d. Gaussian or uniform Boolean data distributions. In this paper, we investigate the hardness of learning under the manifold hypothesis. We ask which minimal assumptions on the curvature and regularity of the manifold, if any, render the learning problem efficiently learnable. We prove that learning is hard under input manifolds of bounded curvature by extending proofs of hardness in the SQ and cryptographic settings for Boolean data inputs to the geometric setting. On the other hand, we show that additional assumptions on the volume of the data manifold alleviate these fundamental limitations and guarantee learnability via a simple interpolation argument. Notable instances of this regime are manifolds which can be reliably reconstructed via manifold learning. Looking forward, we comment on and empirically explore intermediate regimes of manifolds, which have heterogeneous features commonly found in real world data.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Improving Generalization and Convergence by Enhancing Implicit Regularization
Authors:
Mingze Wang,
Haotian He,
**bo Wang,
Zilin Wang,
Guanhua Huang,
Feiyu Xiong,
Zhiyu Li,
Weinan E,
Lei Wu
Abstract:
In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that I…
▽ More
In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that IRE can be practically incorporated with {\em generic base optimizers} without introducing significant computational overload. Experiments show that IRE consistently improves the generalization performance for image classification tasks across a variety of benchmark datasets (CIFAR-10/100, ImageNet) and models (ResNets and ViTs). Surprisingly, IRE also achieves a $2\times$ {\em speed-up} compared to AdamW in the pre-training of Llama models (of sizes ranging from 60M to 229M) on datasets including Wikitext-103, Minipile, and Openwebtext. Moreover, we provide theoretical guarantees, showing that IRE can substantially accelerate the convergence towards flat minima in Sharpness-aware Minimization (SAM).
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Augmented Risk Prediction for the Onset of Alzheimer's Disease from Electronic Health Records with Large Language Models
Authors:
Jiankun Wang,
Sumyeong Ahn,
Taykhoom Dalal,
Xiaodan Zhang,
Weishen Pan,
Qiannan Zhang,
Bin Chen,
Hiroko H. Dodge,
Fei Wang,
Jiayu Zhou
Abstract:
Alzheimer's disease (AD) is the fifth-leading cause of death among Americans aged 65 and older. Screening and early detection of AD and related dementias (ADRD) are critical for timely intervention and for identifying clinical trial participants. The widespread adoption of electronic health records (EHRs) offers an important resource for develo** ADRD screening tools such as machine learning bas…
▽ More
Alzheimer's disease (AD) is the fifth-leading cause of death among Americans aged 65 and older. Screening and early detection of AD and related dementias (ADRD) are critical for timely intervention and for identifying clinical trial participants. The widespread adoption of electronic health records (EHRs) offers an important resource for develo** ADRD screening tools such as machine learning based predictive models. Recent advancements in large language models (LLMs) demonstrate their unprecedented capability of encoding knowledge and performing reasoning, which offers them strong potential for enhancing risk prediction. This paper proposes a novel pipeline that augments risk prediction by leveraging the few-shot inference power of LLMs to make predictions on cases where traditional supervised learning methods (SLs) may not excel. Specifically, we develop a collaborative pipeline that combines SLs and LLMs via a confidence-driven decision-making mechanism, leveraging the strengths of SLs in clear-cut cases and LLMs in more complex scenarios. We evaluate this pipeline using a real-world EHR data warehouse from Oregon Health \& Science University (OHSU) Hospital, encompassing EHRs from over 2.5 million patients and more than 20 million patient encounters. Our results show that our proposed approach effectively combines the power of SLs and LLMs, offering significant improvements in predictive performance. This advancement holds promise for revolutionizing ADRD screening and early detection practices, with potential implications for better strategies of patient management and thus improving healthcare.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Statistical and Computational Guarantees of Kernel Max-Sliced Wasserstein Distances
Authors:
Jie Wang,
March Boedihardjo,
Yao Xie
Abstract:
Optimal transport has been very successful for various machine learning tasks; however, it is known to suffer from the curse of dimensionality. Hence, dimensionality reduction is desirable when applied to high-dimensional data with low-dimensional structures. The kernel max-sliced (KMS) Wasserstein distance is developed for this purpose by finding an optimal nonlinear map** that reduces data int…
▽ More
Optimal transport has been very successful for various machine learning tasks; however, it is known to suffer from the curse of dimensionality. Hence, dimensionality reduction is desirable when applied to high-dimensional data with low-dimensional structures. The kernel max-sliced (KMS) Wasserstein distance is developed for this purpose by finding an optimal nonlinear map** that reduces data into $1$ dimensions before computing the Wasserstein distance. However, its theoretical properties have not yet been fully developed. In this paper, we provide sharp finite-sample guarantees under milder technical assumptions compared with state-of-the-art for the KMS $p$-Wasserstein distance between two empirical distributions with $n$ samples for general $p\in[1,\infty)$. Algorithm-wise, we show that computing the KMS $2$-Wasserstein distance is NP-hard, and then we further propose a semidefinite relaxation (SDR) formulation (which can be solved efficiently in polynomial time) and provide a relaxation gap for the SDP solution. We provide numerical examples to demonstrate the good performance of our scheme for high-dimensional two-sample testing.
△ Less
Submitted 29 May, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
Optimal Sequential Procedure for Early Detection of Multiple Side Effects
Authors:
Jiayue Wang,
Ben Boukai
Abstract:
In this paper, we propose an optimal sequential procedure for the early detection of potential side effects resulting from the administration of some treatment (e.g. a vaccine, say). The results presented here extend previous results obtained in Wang and Boukai (2024) who study the single side effect case to the case of two (or more) side effects. While the sequential procedure we employ, simultan…
▽ More
In this paper, we propose an optimal sequential procedure for the early detection of potential side effects resulting from the administration of some treatment (e.g. a vaccine, say). The results presented here extend previous results obtained in Wang and Boukai (2024) who study the single side effect case to the case of two (or more) side effects. While the sequential procedure we employ, simultaneously monitors several of the treatment's side effects, the $(α, β)$-optimal test we propose does not require any information about the inter-correlation between these potential side effects. However, in all of the subsequent analyses, including the derivations of the exact expressions of the Average Sample Number (ASN), the Power function, and the properties of the post-test (or post-detection) estimators, we accounted specifically, for the correlation between the potential side effects. In the real-life application (such as post-marketing surveillance), the number of available observations is large enough to justify asymptotic analyses of the sequential procedure (testing and post-detection estimation) properties. Accordingly, we also derive the consistency and asymptotic normality of our post-test estimators; results which enable us to also provide (asymptotic, post-detection) confidence intervals for the probabilities of various side-effects. Moreover, to compare two specific side effects, their relative risk plays an important role. We derive the distribution of the estimated relative risk in the asymptotic framework to provide appropriate inference. To illustrate the theoretical results presented, we provide two detailed examples based on the data of side effects on COVID-19 vaccine collected in Nigeria (see Nigeria (see Ilori et al. (2022)).
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Rethinking Data Shapley for Data Selection Tasks: Misleads and Merits
Authors:
Jiachen T. Wang,
Tianji Yang,
James Zou,
Yongchan Kwon,
Ruoxi Jia
Abstract:
Data Shapley provides a principled approach to data valuation and plays a crucial role in data-centric machine learning (ML) research. Data selection is considered a standard application of Data Shapley. However, its data selection performance has shown to be inconsistent across settings in the literature. This study aims to deepen our understanding of this phenomenon. We introduce a hypothesis te…
▽ More
Data Shapley provides a principled approach to data valuation and plays a crucial role in data-centric machine learning (ML) research. Data selection is considered a standard application of Data Shapley. However, its data selection performance has shown to be inconsistent across settings in the literature. This study aims to deepen our understanding of this phenomenon. We introduce a hypothesis testing framework and show that Data Shapley's performance can be no better than random selection without specific constraints on utility functions. We identify a class of utility functions, monotonically transformed modular functions, within which Data Shapley optimally selects data. Based on this insight, we propose a heuristic for predicting Data Shapley's effectiveness in data selection tasks. Our experiments corroborate these findings, adding new insights into when Data Shapley may or may not succeed.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Debiasing Machine Unlearning with Counterfactual Examples
Authors:
Ziheng Chen,
Jia Wang,
Jun Zhuang,
Abbavaram Gowtham Reddy,
Fabrizio Silvestri,
** Huang,
Kaushiki Nag,
Kun Kuang,
Xin Ning,
Gabriele Tolomei
Abstract:
The right to be forgotten (RTBF) seeks to safeguard individuals from the enduring effects of their historical actions by implementing machine-learning techniques. These techniques facilitate the deletion of previously acquired knowledge without requiring extensive model retraining. However, they often overlook a critical issue: unlearning processes bias. This bias emerges from two main sources: (1…
▽ More
The right to be forgotten (RTBF) seeks to safeguard individuals from the enduring effects of their historical actions by implementing machine-learning techniques. These techniques facilitate the deletion of previously acquired knowledge without requiring extensive model retraining. However, they often overlook a critical issue: unlearning processes bias. This bias emerges from two main sources: (1) data-level bias, characterized by uneven data removal, and (2) algorithm-level bias, which leads to the contamination of the remaining dataset, thereby degrading model accuracy. In this work, we analyze the causal factors behind the unlearning process and mitigate biases at both data and algorithmic levels. Typically, we introduce an intervention-based approach, where knowledge to forget is erased with a debiased dataset. Besides, we guide the forgetting procedure by leveraging counterfactual examples, as they maintain semantic data consistency without hurting performance on the remaining dataset. Experimental results demonstrate that our method outperforms existing machine unlearning baselines on evaluation metrics.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
RealTCD: Temporal Causal Discovery from Interventional Data with Large Language Model
Authors:
Peiwen Li,
Xin Wang,
Zeyang Zhang,
Yuan Meng,
Fang Shen,
Yue Li,
Jialong Wang,
Yang Li,
Wenweu Zhu
Abstract:
In the field of Artificial Intelligence for Information Technology Operations, causal discovery is pivotal for operation and maintenance of graph construction, facilitating downstream industrial tasks such as root cause analysis. Temporal causal discovery, as an emerging method, aims to identify temporal causal relationships between variables directly from observations by utilizing interventional…
▽ More
In the field of Artificial Intelligence for Information Technology Operations, causal discovery is pivotal for operation and maintenance of graph construction, facilitating downstream industrial tasks such as root cause analysis. Temporal causal discovery, as an emerging method, aims to identify temporal causal relationships between variables directly from observations by utilizing interventional data. However, existing methods mainly focus on synthetic datasets with heavy reliance on intervention targets and ignore the textual information hidden in real-world systems, failing to conduct causal discovery for real industrial scenarios. To tackle this problem, in this paper we propose to investigate temporal causal discovery in industrial scenarios, which faces two critical challenges: 1) how to discover causal relationships without the interventional targets that are costly to obtain in practice, and 2) how to discover causal relations via leveraging the textual information in systems which can be complex yet abundant in industrial contexts. To address these challenges, we propose the RealTCD framework, which is able to leverage domain knowledge to discover temporal causal relationships without interventional targets. Specifically, we first develop a score-based temporal causal discovery method capable of discovering causal relations for root cause analysis without relying on interventional targets through strategic masking and regularization. Furthermore, by employing Large Language Models (LLMs) to handle texts and integrate domain knowledge, we introduce LLM-guided meta-initialization to extract the meta-knowledge from textual information hidden in systems to boost the quality of discovery. We conduct extensive experiments on simulation and real-world datasets to show the superiority of our proposed RealTCD framework over existing baselines in discovering temporal causal structures.
△ Less
Submitted 26 May, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
An Economic Solution to Copyright Challenges of Generative AI
Authors:
Jiachen T. Wang,
Zhun Deng,
Hiroaki Chiba-Okabe,
Boaz Barak,
Weijie J. Su
Abstract:
Generative artificial intelligence (AI) systems are trained on large data corpora to generate new pieces of text, images, videos, and other media. There is growing concern that such systems may infringe on the copyright interests of training data contributors. To address the copyright challenges of generative AI, we propose a framework that compensates copyright owners proportionally to their cont…
▽ More
Generative artificial intelligence (AI) systems are trained on large data corpora to generate new pieces of text, images, videos, and other media. There is growing concern that such systems may infringe on the copyright interests of training data contributors. To address the copyright challenges of generative AI, we propose a framework that compensates copyright owners proportionally to their contributions to the creation of AI-generated content. The metric for contributions is quantitatively determined by leveraging the probabilistic nature of modern generative AI models and using techniques from cooperative game theory in economics. This framework enables a platform where AI developers benefit from access to high-quality training data, thus improving model performance. Meanwhile, copyright owners receive fair compensation, driving the continued provision of relevant data for generative model training. Experiments demonstrate that our framework successfully identifies the most relevant data sources used in artwork generation, ensuring a fair and interpretable distribution of revenues among copyright owners.
△ Less
Submitted 24 April, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
HiGraphDTI: Hierarchical Graph Representation Learning for Drug-Target Interaction Prediction
Authors:
Bin Liu,
Siqi Wu,
** Wang,
Xin Deng,
Ao Zhou
Abstract:
The discovery of drug-target interactions (DTIs) plays a crucial role in pharmaceutical development. The deep learning model achieves more accurate results in DTI prediction due to its ability to extract robust and expressive features from drug and target chemical structures. However, existing deep learning methods typically generate drug features via aggregating molecular atom representations, ig…
▽ More
The discovery of drug-target interactions (DTIs) plays a crucial role in pharmaceutical development. The deep learning model achieves more accurate results in DTI prediction due to its ability to extract robust and expressive features from drug and target chemical structures. However, existing deep learning methods typically generate drug features via aggregating molecular atom representations, ignoring the chemical properties carried by motifs, i.e., substructures of the molecular graph. The atom-drug double-level molecular representation learning can not fully exploit structure information and fails to interpret the DTI mechanism from the motif perspective. In addition, sequential model-based target feature extraction either fuses limited contextual information or requires expensive computational resources. To tackle the above issues, we propose a hierarchical graph representation learning-based DTI prediction method (HiGraphDTI). Specifically, HiGraphDTI learns hierarchical drug representations from triple-level molecular graphs to thoroughly exploit chemical information embedded in atoms, motifs, and molecules. Then, an attentional feature fusion module incorporates information from different receptive fields to extract expressive target features.Last, the hierarchical attention mechanism identifies crucial molecular segments, which offers complementary views for interpreting interaction mechanisms. The experiment results not only demonstrate the superiority of HiGraphDTI to the state-of-the-art methods, but also confirm the practical ability of our model in interaction interpretation and new DTI discovery.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
HELLINGER-UCB: A novel algorithm for stochastic multi-armed bandit problem and cold start problem in recommender system
Authors:
Ruibo Yang,
Jiazhou Wang,
Andrew Mullhaupt
Abstract:
In this paper, we study the stochastic multi-armed bandit problem, where the reward is driven by an unknown random variable. We propose a new variant of the Upper Confidence Bound (UCB) algorithm called Hellinger-UCB, which leverages the squared Hellinger distance to build the upper confidence bound. We prove that the Hellinger-UCB reaches the theoretical lower bound. We also show that the Helling…
▽ More
In this paper, we study the stochastic multi-armed bandit problem, where the reward is driven by an unknown random variable. We propose a new variant of the Upper Confidence Bound (UCB) algorithm called Hellinger-UCB, which leverages the squared Hellinger distance to build the upper confidence bound. We prove that the Hellinger-UCB reaches the theoretical lower bound. We also show that the Hellinger-UCB has a solid statistical interpretation. We show that Hellinger-UCB is effective in finite time horizons with numerical experiments between Hellinger-UCB and other variants of the UCB algorithm. As a real-world example, we apply the Hellinger-UCB algorithm to solve the cold-start problem for a content recommender system of a financial app. With reasonable assumption, the Hellinger-UCB algorithm has a convenient but important lower latency feature. The online experiment also illustrates that the Hellinger-UCB outperforms both KL-UCB and UCB1 in the sense of a higher click-through rate (CTR).
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Efficient Surgical Tool Recognition via HMM-Stabilized Deep Learning
Authors:
Haifeng Wang,
Hao Xu,
Jun Wang,
Jian Zhou,
Ke Deng
Abstract:
Recognizing various surgical tools, actions and phases from surgery videos is an important problem in computer vision with exciting clinical applications. Existing deep-learning-based methods for this problem either process each surgical video as a series of independent images without considering their dependence, or rely on complicated deep learning models to count for dependence of video frames.…
▽ More
Recognizing various surgical tools, actions and phases from surgery videos is an important problem in computer vision with exciting clinical applications. Existing deep-learning-based methods for this problem either process each surgical video as a series of independent images without considering their dependence, or rely on complicated deep learning models to count for dependence of video frames. In this study, we revealed from exploratory data analysis that surgical videos enjoy relatively simple semantic structure, where the presence of surgical phases and tools can be well modeled by a compact hidden Markov model (HMM). Based on this observation, we propose an HMM-stabilized deep learning method for tool presence detection. A wide range of experiments confirm that the proposed approaches achieve better performance with lower training and running costs, and support more flexible ways to construct and utilize training data in scenarios where not all surgery videos of interest are extensively labelled. These results suggest that popular deep learning approaches with over-complicated model structures may suffer from inefficient utilization of data, and integrating ingredients of deep learning and statistical learning wisely may lead to more powerful algorithms that enjoy competitive performance, transparent interpretation and convenient model training simultaneously.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
TS-CausalNN: Learning Temporal Causal Relations from Non-linear Non-stationary Time Series Data
Authors:
Omar Faruque,
Sahara Ali,
Xue Zheng,
Jianwu Wang
Abstract:
The growing availability and importance of time series data across various domains, including environmental science, epidemiology, and economics, has led to an increasing need for time-series causal discovery methods that can identify the intricate relationships in the non-stationary, non-linear, and often noisy real world data. However, the majority of current time series causal discovery methods…
▽ More
The growing availability and importance of time series data across various domains, including environmental science, epidemiology, and economics, has led to an increasing need for time-series causal discovery methods that can identify the intricate relationships in the non-stationary, non-linear, and often noisy real world data. However, the majority of current time series causal discovery methods assume stationarity and linear relations in data, making them infeasible for the task. Further, the recent deep learning-based methods rely on the traditional causal structure learning approaches making them computationally expensive. In this paper, we propose a Time-Series Causal Neural Network (TS-CausalNN) - a deep learning technique to discover contemporaneous and lagged causal relations simultaneously. Our proposed architecture comprises (i) convolutional blocks comprising parallel custom causal layers, (ii) acyclicity constraint, and (iii) optimization techniques using the augmented Lagrangian approach. In addition to the simple parallel design, an advantage of the proposed model is that it naturally handles the non-stationarity and non-linearity of the data. Through experiments on multiple synthetic and real world datasets, we demonstrate the empirical proficiency of our proposed approach as compared to several state-of-the-art methods. The inferred graphs for the real world dataset are in good agreement with the domain understanding.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Non-Convex Robust Hypothesis Testing using Sinkhorn Uncertainty Sets
Authors:
Jie Wang,
Rui Gao,
Yao Xie
Abstract:
We present a new framework to address the non-convex robust hypothesis testing problem, wherein the goal is to seek the optimal detector that minimizes the maximum of worst-case type-I and type-II risk functions. The distributional uncertainty sets are constructed to center around the empirical distribution derived from samples based on Sinkhorn discrepancy. Given that the objective involves non-c…
▽ More
We present a new framework to address the non-convex robust hypothesis testing problem, wherein the goal is to seek the optimal detector that minimizes the maximum of worst-case type-I and type-II risk functions. The distributional uncertainty sets are constructed to center around the empirical distribution derived from samples based on Sinkhorn discrepancy. Given that the objective involves non-convex, non-smooth probabilistic functions that are often intractable to optimize, existing methods resort to approximations rather than exact solutions. To tackle the challenge, we introduce an exact mixed-integer exponential conic reformulation of the problem, which can be solved into a global optimum with a moderate amount of input data. Subsequently, we propose a convex approximation, demonstrating its superiority over current state-of-the-art methodologies in literature. Furthermore, we establish connections between robust hypothesis testing and regularized formulations of non-robust risk functions, offering insightful interpretations. Our numerical study highlights the satisfactory testing performance and computational efficiency of the proposed framework.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
The risks of risk assessment: causal blind spots when using prediction models for treatment decisions
Authors:
Nan van Geloven,
Ruth H Keogh,
Wouter van Amsterdam,
Giovanni Cinà,
Jesse H. Krijthe,
Niels Peek,
Kim Luijken,
Sara Magliacane,
Paweł Morzywołek,
Thijs van Ommen,
Hein Putter,
Matthew Sperrin,
Junfeng Wang,
Daniala L. Weir,
Vanessa Didelez
Abstract:
Prediction models are increasingly proposed for guiding treatment decisions, but most fail to address the special role of treatments, leading to inappropriate use. This paper highlights the limitations of using standard prediction models for treatment decision support. We identify `causal blind spots' in three common approaches to handling treatments in prediction modelling: including treatment as…
▽ More
Prediction models are increasingly proposed for guiding treatment decisions, but most fail to address the special role of treatments, leading to inappropriate use. This paper highlights the limitations of using standard prediction models for treatment decision support. We identify `causal blind spots' in three common approaches to handling treatments in prediction modelling: including treatment as a predictor, restricting data based on treatment status and ignoring treatments. When predictions are used to inform treatment decisions, confounders, colliders and mediators, as well as changes in treatment protocols over time may lead to misinformed decision-making. We illustrate potential harmful consequences in several medical applications. We advocate for an extension of guidelines for development, reporting and evaluation of prediction models to ensure that the intended use of the model is matched to an appropriate risk estimand. When prediction models are intended to inform treatment decisions, prediction models should specify upfront the treatment decisions they aim to support and target a prediction estimand in line with that goal. This requires a shift towards develo** predictions under the specific treatment options under consideration (`predictions under interventions'). Predictions under interventions need causal reasoning and inference techniques during development and validation. We argue that this will improve the efficacy of prediction models in guiding treatment decisions and prevent potential negative effects on patient outcomes.
△ Less
Submitted 6 May, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Mini-Hes: A Parallelizable Second-order Latent Factor Analysis Model
Authors:
Jialiang Wang,
Weiling Li,
Yurong Zhong,
Xin Luo
Abstract:
Interactions among large number of entities is naturally high-dimensional and incomplete (HDI) in many big data related tasks. Behavioral characteristics of users are hidden in these interactions, hence, effective representation of the HDI data is a fundamental task for understanding user behaviors. Latent factor analysis (LFA) model has proven to be effective in representing HDI data. The perform…
▽ More
Interactions among large number of entities is naturally high-dimensional and incomplete (HDI) in many big data related tasks. Behavioral characteristics of users are hidden in these interactions, hence, effective representation of the HDI data is a fundamental task for understanding user behaviors. Latent factor analysis (LFA) model has proven to be effective in representing HDI data. The performance of an LFA model relies heavily on its training process, which is a non-convex optimization. It has been proven that incorporating local curvature and preprocessing gradients during its training process can lead to superior performance compared to LFA models built with first-order family methods. However, with the escalation of data volume, the feasibility of second-order algorithms encounters challenges. To address this pivotal issue, this paper proposes a mini-block diagonal hessian-free (Mini-Hes) optimization for building an LFA model. It leverages the dominant diagonal blocks in the generalized Gauss-Newton matrix based on the analysis of the Hessian matrix of LFA model and serves as an intermediary strategy bridging the gap between first-order and second-order optimization methods. Experiment results indicate that, with Mini-Hes, the LFA model outperforms several state-of-the-art models in addressing missing data estimation task on multiple real HDI datasets from recommender system. (The source code of Mini-Hes is available at https://github.com/Goallow/Mini-Hes)
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Timer: Generative Pre-trained Transformers Are Large Time Series Models
Authors:
Yong Liu,
Haoran Zhang,
Chenyu Li,
Xiangdong Huang,
Jianmin Wang,
Mingsheng Long
Abstract:
Deep learning has contributed remarkably to the advancement of time series analysis. Still, deep models can encounter performance bottlenecks in real-world data-scarce scenarios, which can be concealed due to the performance saturation with small models on current benchmarks. Meanwhile, large models have demonstrated great powers in these scenarios through large-scale pre-training. Continuous prog…
▽ More
Deep learning has contributed remarkably to the advancement of time series analysis. Still, deep models can encounter performance bottlenecks in real-world data-scarce scenarios, which can be concealed due to the performance saturation with small models on current benchmarks. Meanwhile, large models have demonstrated great powers in these scenarios through large-scale pre-training. Continuous progress has been achieved with the emergence of large language models, exhibiting unprecedented abilities such as few-shot generalization, scalability, and task generality, which are however absent in small deep models. To change the status quo of training scenario-specific small models from scratch, this paper aims at the early development of large time series models (LTSM). During pre-training, we curate large-scale datasets with up to 1 billion time points, unify heterogeneous time series into single-series sequence (S3) format, and develop the GPT-style architecture toward LTSMs. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task. The outcome of this study is a Time Series Transformer (Timer), which is generative pre-trained by next token prediction and adapted to various downstream tasks with promising capabilities as an LTSM. Code and datasets are available at: https://github.com/thuml/Large-Time-Series-Model.
△ Less
Submitted 4 June, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
Early Detection of Treatments Side Effect: A Sequential Approach
Authors:
Jiayue Wang,
Ben Boukai
Abstract:
With the emergence and spread of infectious diseases with pandemic potential, such as COVID- 19, the urgency for vaccine development have led to unprecedented compressed and accelerated schedules that shortened the standard development timeline. In a relatively short time, the leading pharmaceutical companies1, received an Emergency Use Authorization (EUA) for vaccine\prime s en-mass deployment To…
▽ More
With the emergence and spread of infectious diseases with pandemic potential, such as COVID- 19, the urgency for vaccine development have led to unprecedented compressed and accelerated schedules that shortened the standard development timeline. In a relatively short time, the leading pharmaceutical companies1, received an Emergency Use Authorization (EUA) for vaccine\prime s en-mass deployment To monitor the potential side effect(s) of the vaccine during the (initial) vaccination campaign, we developed an optimal sequential test that allows for the early detection of potential side effect(s). This test employs a rule to stop the vaccination process once the observed number of side effect incidents exceeds a certain (pre-determined) threshold. The optimality of the proposed sequential test is justified when compared with the (α, β) optimality of the non-randomized fixed-sample Uniformly Most Powerful (UMP) test. In the case of a single side effect, we study the properties of the sequential test and derive the exact expressions of the Average Sample Number (ASN) curve of the stop** time (and its variance) via the regularized incomplete beta function. Additionally, we derive the asymptotic distribution of the relative savings in ASN as compared to maximal sample size. Moreover, we construct the post-test parameter estimate and studied its sampling properties, including its asymptotic behavior under local-type alternatives. These limiting behavior results are the consistency and asymptotic normality of the post-test parameter estimator. We conclude the paper with a small simulation study illustrating the asymptotic performance of the point and interval estimation and provide a detailed example, based on COVID-19 side effect data (see Beatty et al. (2021)) of our suggested testing procedure.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Full Bayesian Significance Testing for Neural Networks
Authors:
Zehua Liu,
Zimeng Li,
**gyuan Wang,
Yue He
Abstract:
Significance testing aims to determine whether a proposition about the population distribution is the truth or not given observations. However, traditional significance testing often needs to derive the distribution of the testing statistic, failing to deal with complex nonlinear relationships. In this paper, we propose to conduct Full Bayesian Significance Testing for neural networks, called \tex…
▽ More
Significance testing aims to determine whether a proposition about the population distribution is the truth or not given observations. However, traditional significance testing often needs to derive the distribution of the testing statistic, failing to deal with complex nonlinear relationships. In this paper, we propose to conduct Full Bayesian Significance Testing for neural networks, called \textit{n}FBST, to overcome the limitation in relationship characterization of traditional approaches. A Bayesian neural network is utilized to fit the nonlinear and multi-dimensional relationships with small errors and avoid hard theoretical derivation by computing the evidence value. Besides, \textit{n}FBST can test not only global significance but also local and instance-wise significance, which previous testing methods don't focus on. Moreover, \textit{n}FBST is a general framework that can be extended based on the measures selected, such as Grad-\textit{n}FBST, LRP-\textit{n}FBST, DeepLIFT-\textit{n}FBST, LIME-\textit{n}FBST. A range of experiments on both simulated and real data are conducted to show the advantages of our method.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Efficient Data Shapley for Weighted Nearest Neighbor Algorithms
Authors:
Jiachen T. Wang,
Prateek Mittal,
Ruoxi Jia
Abstract:
This work aims to address an open problem in data valuation literature concerning the efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley). By considering the accuracy of hard-label KNN with discretized weights as the utility function, we reframe the computation of WKNN-Shapley into a counting problem and introduce a quadratic-time algorithm, presenting…
▽ More
This work aims to address an open problem in data valuation literature concerning the efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley). By considering the accuracy of hard-label KNN with discretized weights as the utility function, we reframe the computation of WKNN-Shapley into a counting problem and introduce a quadratic-time algorithm, presenting a notable improvement from $O(N^K)$, the best result from existing literature. We develop a deterministic approximation algorithm that further improves computational efficiency while maintaining the key fairness properties of the Shapley value. Through extensive experiments, we demonstrate WKNN-Shapley's computational efficiency and its superior performance in discerning data quality compared to its unweighted counterpart.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Understanding Heterophily for Graph Neural Networks
Authors:
Junfu Wang,
Yuanfang Guo,
Liang Yang,
Yunhong Wang
Abstract:
Graphs with heterophily have been regarded as challenging scenarios for Graph Neural Networks (GNNs), where nodes are connected with dissimilar neighbors through various patterns. In this paper, we present theoretical understandings of the impacts of different heterophily patterns for GNNs by incorporating the graph convolution (GC) operations into fully connected networks via the proposed Heterop…
▽ More
Graphs with heterophily have been regarded as challenging scenarios for Graph Neural Networks (GNNs), where nodes are connected with dissimilar neighbors through various patterns. In this paper, we present theoretical understandings of the impacts of different heterophily patterns for GNNs by incorporating the graph convolution (GC) operations into fully connected networks via the proposed Heterophilous Stochastic Block Models (HSBM), a general random graph model that can accommodate diverse heterophily patterns. Firstly, we show that by applying a GC operation, the separability gains are determined by two factors, i.e., the Euclidean distance of the neighborhood distributions and $\sqrt{\mathbb{E}\left[\operatorname{deg}\right]}$, where $\mathbb{E}\left[\operatorname{deg}\right]$ is the averaged node degree. It reveals that the impact of heterophily on classification needs to be evaluated alongside the averaged node degree. Secondly, we show that the topological noise has a detrimental impact on separability, which is equivalent to degrading $\mathbb{E}\left[\operatorname{deg}\right]$. Finally, when applying multiple GC operations, we show that the separability gains are determined by the normalized distance of the $l$-powered neighborhood distributions. It indicates that the nodes still possess separability as $l$ goes to infinity in a wide range of regimes. Extensive experiments on both synthetic and real-world data verify the effectiveness of our theory.
△ Less
Submitted 4 June, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
Co-Clustering Multi-View Data Using the Latent Block Model
Authors:
Joshua Tobin,
Michaela Black,
James Ng,
Debbie Rankin,
Jonathan Wallace,
Catherine Hughes,
Leane Hoey,
Adrian Moore,
**ling Wang,
Geraldine Horigan,
Paul Carlin,
Helene McNulty,
Anne M Molloy,
Mimi Zhang
Abstract:
The Latent Block Model (LBM) is a prominent model-based co-clustering method, returning parametric representations of each block cluster and allowing the use of well-grounded model selection methods. The LBM, while adapted in literature to handle different feature types, cannot be applied to datasets consisting of multiple disjoint sets of features, termed views, for a common set of observations.…
▽ More
The Latent Block Model (LBM) is a prominent model-based co-clustering method, returning parametric representations of each block cluster and allowing the use of well-grounded model selection methods. The LBM, while adapted in literature to handle different feature types, cannot be applied to datasets consisting of multiple disjoint sets of features, termed views, for a common set of observations. In this work, we introduce the multi-view LBM, extending the LBM method to multi-view data, where each view marginally follows an LBM. In the case of two views, the dependence between them is captured by a cluster membership matrix, and we aim to learn the structure of this matrix. We develop a likelihood-based approach in which parameter estimation uses a stochastic EM algorithm integrating a Gibbs sampler, and an ICL criterion is derived to determine the number of row and column clusters in each view. To motivate the application of multi-view methods, we extend recent work develo** hypothesis tests for the null hypothesis that clusters of observations in each view are independent of each other. The testing procedure is integrated into the model estimation strategy. Furthermore, we introduce a penalty scheme to generate sparse row clusterings. We verify the performance of the developed algorithm using synthetic datasets, and provide guidance for optimal parameter selection. Finally, the multi-view co-clustering method is applied to a complex genomics dataset, and is shown to provide new insights for high-dimension multi-view problems.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Large Language Model for Causal Decision Making
Authors:
Haitao Jiang,
Lin Ge,
Yuhe Gao,
Jianian Wang,
Rui Song
Abstract:
Large Language Models (LLMs) have shown their success in language understanding and reasoning on general topics. However, their capability to perform inference based on user-specified structured data and knowledge in corpus-rare concepts, such as causal decision-making is still limited. In this work, we explore the possibility of fine-tuning an open-sourced LLM into LLM4Causal, which can identify…
▽ More
Large Language Models (LLMs) have shown their success in language understanding and reasoning on general topics. However, their capability to perform inference based on user-specified structured data and knowledge in corpus-rare concepts, such as causal decision-making is still limited. In this work, we explore the possibility of fine-tuning an open-sourced LLM into LLM4Causal, which can identify the causal task, execute a corresponding function, and interpret its numerical results based on users' queries and the provided dataset. Meanwhile, we propose a data generation process for more controllable GPT prompting and present two instruction-tuning datasets: (1) Causal-Retrieval-Bench for causal problem identification and input parameter extraction for causal function calling and (2) Causal-Interpret-Bench for in-context causal interpretation. By conducting end-to-end evaluations and two ablation studies, we showed that LLM4Causal can deliver end-to-end solutions for causal problems and provide easy-to-understand answers, which significantly outperforms the baselines.
△ Less
Submitted 11 April, 2024; v1 submitted 28 December, 2023;
originally announced December 2023.
-
Mediation Analysis with Mendelian Randomization and Efficient Multiple GWAS Integration
Authors:
Rita Qiuran Lyu,
Chong Wu,
Xinwei Ma,
**gshen Wang
Abstract:
Mediation analysis is a powerful tool for studying causal pathways between exposure, mediator, and outcome variables of interest. While classical mediation analysis using observational data often requires strong and sometimes unrealistic assumptions, such as unconfoundedness, Mendelian Randomization (MR) avoids unmeasured confounding bias by employing genetic variations as instrumental variables.…
▽ More
Mediation analysis is a powerful tool for studying causal pathways between exposure, mediator, and outcome variables of interest. While classical mediation analysis using observational data often requires strong and sometimes unrealistic assumptions, such as unconfoundedness, Mendelian Randomization (MR) avoids unmeasured confounding bias by employing genetic variations as instrumental variables. We develop a novel MR framework for mediation analysis with genome-wide associate study (GWAS) summary data, and provide solid statistical guarantees. Our framework employs carefully crafted estimating equations, allowing for different sets of genetic variations to instrument the exposure and the mediator, to efficiently integrate information stored in three independent GWAS. As part of this endeavor, we demonstrate that in mediation analysis, the challenge raised by instrument selection goes beyond the well-known winner's curse issue, and therefore, addressing it requires special treatment. We then develop bias correction techniques to address the instrument selection issue and commonly encountered measurement error bias issue. Collectively, through our theoretical investigations, we show that our framework provides valid statistical inference for both direct and mediation effects with enhanced statistical efficiency compared to existing methods. We further illustrate the finite-sample performance of our approach through simulation experiments and a case study.
△ Less
Submitted 17 May, 2024; v1 submitted 16 December, 2023;
originally announced December 2023.
-
Adaptive Experiments Toward Learning Treatment Effect Heterogeneity
Authors:
Waverly Wei,
Xinwei Ma,
**gshen Wang
Abstract:
Understanding treatment effect heterogeneity has become an increasingly popular task in various fields, as it helps design personalized advertisements in e-commerce or targeted treatment in biomedical studies. However, most of the existing work in this research area focused on either analyzing observational data based on strong causal assumptions or conducting post hoc analyses of randomized contr…
▽ More
Understanding treatment effect heterogeneity has become an increasingly popular task in various fields, as it helps design personalized advertisements in e-commerce or targeted treatment in biomedical studies. However, most of the existing work in this research area focused on either analyzing observational data based on strong causal assumptions or conducting post hoc analyses of randomized controlled trial data, and there has been limited effort dedicated to the design of randomized experiments specifically for uncovering treatment effect heterogeneity. In the manuscript, we develop a framework for designing and analyzing response adaptive experiments toward better learning treatment effect heterogeneity. Concretely, we provide response adaptive experimental design frameworks that sequentially revise the data collection mechanism according to the accrued evidence during the experiment. Such design strategies allow for the identification of subgroups with the largest treatment effects with enhanced statistical efficiency. The proposed frameworks not only unify adaptive enrichment designs and response-adaptive randomization designs but also complement A/B test designs in e-commerce and randomized trial designs in clinical settings. We demonstrate the merit of our design with theoretical justifications and in simulation studies with synthetic e-commerce and clinical trial data.
△ Less
Submitted 13 December, 2023; v1 submitted 11 December, 2023;
originally announced December 2023.
-
Hacking Task Confounder in Meta-Learning
Authors:
**gyao Wang,
Yi Ren,
Zeen Song,
Jianqi Zhang,
Changwen Zheng,
Wenwen Qiang
Abstract:
Meta-learning enables rapid generalization to new tasks by learning knowledge from various tasks. It is intuitively assumed that as the training progresses, a model will acquire richer knowledge, leading to better generalization performance. However, our experiments reveal an unexpected result: there is negative knowledge transfer between tasks, affecting generalization performance. To explain thi…
▽ More
Meta-learning enables rapid generalization to new tasks by learning knowledge from various tasks. It is intuitively assumed that as the training progresses, a model will acquire richer knowledge, leading to better generalization performance. However, our experiments reveal an unexpected result: there is negative knowledge transfer between tasks, affecting generalization performance. To explain this phenomenon, we conduct Structural Causal Models (SCMs) for causal analysis. Our investigation uncovers the presence of spurious correlations between task-specific causal factors and labels in meta-learning. Furthermore, the confounding factors differ across different batches. We refer to these confounding factors as "Task Confounders". Based on these findings, we propose a plug-and-play Meta-learning Causal Representation Learner (MetaCRL) to eliminate task confounders. It encodes decoupled generating factors from multiple tasks and utilizes an invariant-based bi-level optimization mechanism to ensure their causality for meta-learning. Extensive experiments on various benchmark datasets demonstrate that our work achieves state-of-the-art (SOTA) performance.
△ Less
Submitted 29 May, 2024; v1 submitted 10 December, 2023;
originally announced December 2023.
-
Multi-granularity Causal Structure Learning
Authors:
Jiaxuan Liang,
Jun Wang,
Guoxian Yu,
Shuyin Xia,
Guoyin Wang
Abstract:
Unveil, model, and comprehend the causal mechanisms underpinning natural phenomena stand as fundamental endeavors across myriad scientific disciplines. Meanwhile, new knowledge emerges when discovering causal relationships from data. Existing causal learning algorithms predominantly focus on the isolated effects of variables, overlook the intricate interplay of multiple variables and their collect…
▽ More
Unveil, model, and comprehend the causal mechanisms underpinning natural phenomena stand as fundamental endeavors across myriad scientific disciplines. Meanwhile, new knowledge emerges when discovering causal relationships from data. Existing causal learning algorithms predominantly focus on the isolated effects of variables, overlook the intricate interplay of multiple variables and their collective behavioral patterns. Furthermore, the ubiquity of high-dimensional data exacts a substantial temporal cost for causal algorithms. In this paper, we develop a novel method called MgCSL (Multi-granularity Causal Structure Learning), which first leverages sparse auto-encoder to explore coarse-graining strategies and causal abstractions from micro-variables to macro-ones. MgCSL then takes multi-granularity variables as inputs to train multilayer perceptrons and to delve the causality between variables. To enhance the efficacy on high-dimensional data, MgCSL introduces a simplified acyclicity constraint to adeptly search the directed acyclic graph among variables. Experimental results show that MgCSL outperforms competitive baselines, and finds out explainable causal connections on fMRI datasets.
△ Less
Submitted 12 December, 2023; v1 submitted 9 December, 2023;
originally announced December 2023.
-
On the Estimation Performance of Generalized Power Method for Heteroscedastic Probabilistic PCA
Authors:
**xin Wang,
Chonghe Jiang,
Huikang Liu,
Anthony Man-Cho So
Abstract:
The heteroscedastic probabilistic principal component analysis (PCA) technique, a variant of the classic PCA that considers data heterogeneity, is receiving more and more attention in the data science and signal processing communities. In this paper, to estimate the underlying low-dimensional linear subspace (simply called \emph{ground truth}) from available heterogeneous data samples, we consider…
▽ More
The heteroscedastic probabilistic principal component analysis (PCA) technique, a variant of the classic PCA that considers data heterogeneity, is receiving more and more attention in the data science and signal processing communities. In this paper, to estimate the underlying low-dimensional linear subspace (simply called \emph{ground truth}) from available heterogeneous data samples, we consider the associated non-convex maximum-likelihood estimation problem, which involves maximizing a sum of heterogeneous quadratic forms over an orthogonality constraint (HQPOC). We propose a first-order method -- generalized power method (GPM) -- to tackle the problem and establish its \emph{estimation performance} guarantee. Specifically, we show that, given a suitable initialization, the distances between the iterates generated by GPM and the ground truth decrease at least geometrically to some threshold associated with the residual part of certain "population-residual decomposition". In establishing the estimation performance result, we prove a novel local error bound property of another closely related optimization problem, namely quadratic optimization with orthogonality constraint (QPOC), which is new and can be of independent interest. Numerical experiments are conducted to demonstrate the superior performance of GPM in both Gaussian noise and sub-Gaussian noise settings.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Risk-based decision making: estimands for sequential prediction under interventions
Authors:
Kim Luijken,
Paweł Morzywołek,
Wouter van Amsterdam,
Giovanni Cinà,
Jeroen Hoogland,
Ruth Keogh,
Jesse Krijthe,
Sara Magliacane,
Thijs van Ommen,
Niels Peek,
Hein Putter,
Maarten van Smeden,
Matthew Sperrin,
Junfeng Wang,
Daniala Weir,
Vanessa Didelez,
Nan van Geloven
Abstract:
Prediction models are used amongst others to inform medical decisions on interventions. Typically, individuals with high risks of adverse outcomes are advised to undergo an intervention while those at low risk are advised to refrain from it. Standard prediction models do not always provide risks that are relevant to inform such decisions: e.g., an individual may be estimated to be at low risk beca…
▽ More
Prediction models are used amongst others to inform medical decisions on interventions. Typically, individuals with high risks of adverse outcomes are advised to undergo an intervention while those at low risk are advised to refrain from it. Standard prediction models do not always provide risks that are relevant to inform such decisions: e.g., an individual may be estimated to be at low risk because similar individuals in the past received an intervention which lowered their risk. Therefore, prediction models supporting decisions should target risks belonging to defined intervention strategies. Previous works on prediction under interventions assumed that the prediction model was used only at one time point to make an intervention decision. In clinical practice, intervention decisions are rarely made only once: they might be repeated, deferred and re-evaluated. This requires estimated risks under interventions that can be reconsidered at several potential decision moments. In the current work, we highlight key considerations for formulating estimands in sequential prediction under interventions that can inform such intervention decisions. We illustrate these considerations by giving examples of estimands for a case study about choosing between vaginal delivery and cesarean section for women giving birth. Our formalization of prediction tasks in a sequential, causal, and estimand context provides guidance for future studies to ensure that the right question is answered and appropriate causal estimation approaches are chosen to develop sequential prediction models that can inform intervention decisions.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Attentional Graph Neural Networks for Robust Massive Network Localization
Authors:
Wenzhong Yan,
Juntao Wang,
Feng Yin,
Yang Tian,
Abdelhak M. Zoubir
Abstract:
In recent years, Graph neural networks (GNNs) have emerged as a prominent tool for classification tasks in machine learning. However, their application in regression tasks remains underexplored. To tap the potential of GNNs in regression, this paper integrates GNNs with attention mechanism, a technique that revolutionized sequential learning tasks with its adaptability and robustness, to tackle a…
▽ More
In recent years, Graph neural networks (GNNs) have emerged as a prominent tool for classification tasks in machine learning. However, their application in regression tasks remains underexplored. To tap the potential of GNNs in regression, this paper integrates GNNs with attention mechanism, a technique that revolutionized sequential learning tasks with its adaptability and robustness, to tackle a challenging nonlinear regression problem: network localization. We first introduce a novel network localization method based on graph convolutional network (GCN), which exhibits exceptional precision even under severe non-line-of-sight (NLOS) conditions, thereby diminishing the need for laborious offline calibration or NLOS identification. We further propose an attentional graph neural network (AGNN) model, aimed at improving the limited flexibility and mitigating the high sensitivity to the hyperparameter of the GCN-based method. The AGNN comprises two crucial modules, each designed with distinct attention architectures to address specific issues associated with the GCN-based method, rendering it more practical in real-world scenarios. Experimental results substantiate the efficacy of our proposed GCN-based method and AGNN model, as well as the enhancements of AGNN model. Additionally, we delve into the performance improvements of AGNN model by analyzing it from the perspectives of dynamic attention and computational complexity.
△ Less
Submitted 14 February, 2024; v1 submitted 28 November, 2023;
originally announced November 2023.
-
Online Prediction of Extreme Conditional Quantiles via B-Spline Interpolation
Authors:
Zhengpin Li,
Jian Wang,
Yanxi Hou
Abstract:
Extreme quantiles are critical for understanding the behavior of data in the tail region of a distribution. It is challenging to estimate extreme quantiles, particularly when dealing with limited data in the tail. In such cases, extreme value theory offers a solution by approximating the tail distribution using the Generalized Pareto Distribution (GPD). This allows for the extrapolation beyond the…
▽ More
Extreme quantiles are critical for understanding the behavior of data in the tail region of a distribution. It is challenging to estimate extreme quantiles, particularly when dealing with limited data in the tail. In such cases, extreme value theory offers a solution by approximating the tail distribution using the Generalized Pareto Distribution (GPD). This allows for the extrapolation beyond the range of observed data, making it a valuable tool for various applications. However, when it comes to conditional cases, where estimation relies on covariates, existing methods may require computationally expensive GPD fitting for different observations. This computational burden becomes even more problematic as the volume of observations increases, sometimes approaching infinity. To address this issue, we propose an interpolation-based algorithm named EMI. EMI facilitates the online prediction of extreme conditional quantiles with finite offline observations. Combining quantile regression and GPD-based extrapolation, EMI formulates as a bilevel programming problem, efficiently solvable using classic optimization methods. Once estimates for offline observations are obtained, EMI employs B-spline interpolation for covariate-dependent variables, enabling estimation for online observations with finite GPD fitting. Simulations and real data analysis demonstrate the effectiveness of EMI across various scenarios.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
Optimal Time of Arrival Estimation for MIMO Backscatter Channels
Authors:
Chen He,
Luyang Han,
Z. Jane Wang
Abstract:
In this paper, we propose a novel time of arrival (TOA) estimator for multiple-input-multiple-output (MIMO) backscatter channels in closed form. The proposed estimator refines the estimation precision from the topological structure of the MIMO backscatter channels, and can considerably enhance the estimation accuracy. Particularly, we show that for the general $M \times N$ bistatic topology, the m…
▽ More
In this paper, we propose a novel time of arrival (TOA) estimator for multiple-input-multiple-output (MIMO) backscatter channels in closed form. The proposed estimator refines the estimation precision from the topological structure of the MIMO backscatter channels, and can considerably enhance the estimation accuracy. Particularly, we show that for the general $M \times N$ bistatic topology, the mean square error (MSE) is $\frac{M+N-1}{MN}σ^2_0$, and for the general $M \times M$ monostatic topology, it is $\frac{2M-1}{M^2}σ^2_0$ for the diagonal subchannels, and $\frac{M-1}{M^2}σ^2_0$ for the off-diagonal subchannels, where $σ^2_0$ is the MSE of the conventional least square estimator. In addition, we derive the Cramer-Rao lower bound (CRLB) for MIMO backscatter TOA estimation which indicates that the proposed estimator is optimal. Simulation results verify that the proposed TOA estimator can considerably improve both estimation and positioning accuracy, especially when the MIMO scale is large.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Infinite forecast combinations based on Dirichlet process
Authors:
Yinuo Ren,
Feng Li,
Yanfei Kang,
Jue Wang
Abstract:
Forecast combination integrates information from various sources by consolidating multiple forecast results from the target time series. Instead of the need to select a single optimal forecasting model, this paper introduces a deep learning ensemble forecasting model based on the Dirichlet process. Initially, the learning rate is sampled with three basis distributions as hyperparameters to convert…
▽ More
Forecast combination integrates information from various sources by consolidating multiple forecast results from the target time series. Instead of the need to select a single optimal forecasting model, this paper introduces a deep learning ensemble forecasting model based on the Dirichlet process. Initially, the learning rate is sampled with three basis distributions as hyperparameters to convert the infinite mixture into a finite one. All checkpoints are collected to establish a deep learning sub-model pool, and weight adjustment and diversity strategies are developed during the combination process. The main advantage of this method is its ability to generate the required base learners through a single training process, utilizing the decaying strategy to tackle the challenge posed by the stochastic nature of gradient descent in determining the optimal learning rate. To ensure the method's generalizability and competitiveness, this paper conducts an empirical analysis using the weekly dataset from the M4 competition and explores sensitivity to the number of models to be combined. The results demonstrate that the ensemble model proposed offers substantial improvements in prediction accuracy and stability compared to a single benchmark model.
△ Less
Submitted 24 November, 2023; v1 submitted 21 November, 2023;
originally announced November 2023.
-
Optimal Treatment Allocation for Efficient Policy Evaluation in Sequential Decision Making
Authors:
Ting Li,
Chengchun Shi,
Jianing Wang,
Fan Zhou,
Hongtu Zhu
Abstract:
A/B testing is critical for modern technological companies to evaluate the effectiveness of newly developed products against standard baselines. This paper studies optimal designs that aim to maximize the amount of information obtained from online experiments to estimate treatment effects accurately. We propose three optimal allocation strategies in a dynamic setting where treatments are sequentia…
▽ More
A/B testing is critical for modern technological companies to evaluate the effectiveness of newly developed products against standard baselines. This paper studies optimal designs that aim to maximize the amount of information obtained from online experiments to estimate treatment effects accurately. We propose three optimal allocation strategies in a dynamic setting where treatments are sequentially assigned over time. These strategies are designed to minimize the variance of the treatment effect estimator when data follow a non-Markov decision process or a (time-varying) Markov decision process. We further develop estimation procedures based on existing off-policy evaluation (OPE) methods and conduct extensive experiments in various environments to demonstrate the effectiveness of the proposed methodologies. In theory, we prove the optimality of the proposed treatment allocation design and establish upper bounds for the mean squared errors of the resulting treatment effect estimators.
△ Less
Submitted 4 November, 2023;
originally announced November 2023.
-
Aggregating Dependent Signals with Heavy-Tailed Combination Tests
Authors:
Lin Gui,
Yuchao Jiang,
**gshu Wang
Abstract:
Combining dependent p-values to evaluate the global null hypothesis presents a longstanding challenge in statistical inference, particularly when aggregating results from diverse methods to boost signal detection. P-value combination tests using heavy-tailed distribution based transformations, such as the Cauchy combination test and the harmonic mean p-value, have recently garnered significant int…
▽ More
Combining dependent p-values to evaluate the global null hypothesis presents a longstanding challenge in statistical inference, particularly when aggregating results from diverse methods to boost signal detection. P-value combination tests using heavy-tailed distribution based transformations, such as the Cauchy combination test and the harmonic mean p-value, have recently garnered significant interest for their potential to efficiently handle arbitrary p-value dependencies. Despite their growing popularity in practical applications, there is a gap in comprehensive theoretical and empirical evaluations of these methods. This paper conducts an extensive investigation, revealing that, theoretically, while these combination tests are asymptotically valid for pairwise quasi-asymptotically independent test statistics, such as bivariate normal variables, they are also asymptotically equivalent to the Bonferroni test under the same conditions. However, extensive simulations unveil their practical utility, especially in scenarios where stringent type-I error control is not necessary and signals are dense. Both the heaviness of the distribution and its support substantially impact the tests' non-asymptotic validity and power, and we recommend using a truncated Cauchy distribution in practice. Moreover, we show that under the violation of quasi-asymptotic independence among test statistics, these tests remain valid and, in fact, can be considerably less conservative than the Bonferroni test. We also present two case studies in genetics and genomics, showcasing the potential of the combination tests to significantly enhance statistical power while effectively controlling type-I errors.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
Fair Adaptive Experiments
Authors:
Waverly Wei,
Xinwei Ma,
**gshen Wang
Abstract:
Randomized experiments have been the gold standard for assessing the effectiveness of a treatment or policy. The classical complete randomization approach assigns treatments based on a prespecified probability and may lead to inefficient use of data. Adaptive experiments improve upon complete randomization by sequentially learning and updating treatment assignment probabilities. However, their app…
▽ More
Randomized experiments have been the gold standard for assessing the effectiveness of a treatment or policy. The classical complete randomization approach assigns treatments based on a prespecified probability and may lead to inefficient use of data. Adaptive experiments improve upon complete randomization by sequentially learning and updating treatment assignment probabilities. However, their application can also raise fairness and equity concerns, as assignment probabilities may vary drastically across groups of participants. Furthermore, when treatment is expected to be extremely beneficial to certain groups of participants, it is more appropriate to expose many of these participants to favorable treatment. In response to these challenges, we propose a fair adaptive experiment strategy that simultaneously enhances data use efficiency, achieves an envy-free treatment assignment guarantee, and improves the overall welfare of participants. An important feature of our proposed strategy is that we do not impose parametric modeling assumptions on the outcome variables, making it more versatile and applicable to a wider array of applications. Through our theoretical investigation, we characterize the convergence rate of the estimated treatment effects and the associated standard deviations at the group level and further prove that our adaptive treatment assignment algorithm, despite not having a closed-form expression, approaches the optimal allocation rule asymptotically. Our proof strategy takes into account the fact that the allocation decisions in our design depend on sequentially accumulated data, which poses a significant challenge in characterizing the properties and conducting statistical inference of our method. We further provide simulation evidence to showcase the performance of our fair adaptive experiment strategy.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Multivariate Dynamic Mediation Analysis under a Reinforcement Learning Framework
Authors:
Lan Luo,
Chengchun Shi,
Jitao Wang,
Zhenke Wu,
Lexin Li
Abstract:
Mediation analysis is an important analytic tool commonly used in a broad range of scientific applications. In this article, we study the problem of mediation analysis when there are multivariate and conditionally dependent mediators, and when the variables are observed over multiple time points. The problem is challenging, because the effect of a mediator involves not only the path from the treat…
▽ More
Mediation analysis is an important analytic tool commonly used in a broad range of scientific applications. In this article, we study the problem of mediation analysis when there are multivariate and conditionally dependent mediators, and when the variables are observed over multiple time points. The problem is challenging, because the effect of a mediator involves not only the path from the treatment to this mediator itself at the current time point, but also all possible paths pointed to this mediator from its upstream mediators, as well as the carryover effects from all previous time points. We propose a novel multivariate dynamic mediation analysis approach. Drawing inspiration from the Markov decision process model that is frequently employed in reinforcement learning, we introduce a Markov mediation process paired with a system of time-varying linear structural equation models to formulate the problem. We then formally define the individual mediation effect, built upon the idea of simultaneous interventions and intervention calculus. We next derive the closed-form expression and propose an iterative estimation procedure under the Markov mediation process model. We study both the asymptotic property and the empirical performance of the proposed estimator, and further illustrate our method with a mobile health application.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Structural transfer learning of non-Gaussian DAG
Authors:
Mingyang Ren,
Xin He,
Junhui Wang
Abstract:
Directed acyclic graph (DAG) has been widely employed to represent directional relationships among a set of collected nodes. Yet, the available data in one single study is often limited for accurate DAG reconstruction, whereas heterogeneous data may be collected from multiple relevant studies. It remains an open question how to pool the heterogeneous data together for better DAG structure reconstr…
▽ More
Directed acyclic graph (DAG) has been widely employed to represent directional relationships among a set of collected nodes. Yet, the available data in one single study is often limited for accurate DAG reconstruction, whereas heterogeneous data may be collected from multiple relevant studies. It remains an open question how to pool the heterogeneous data together for better DAG structure reconstruction in the target study. In this paper, we first introduce a novel set of structural similarity measures for DAG and then present a transfer DAG learning framework by effectively leveraging information from auxiliary DAGs of different levels of similarities. Our theoretical analysis shows substantial improvement in terms of DAG reconstruction in the target study, even when no auxiliary DAG is overall similar to the target DAG, which is in sharp contrast to most existing transfer learning methods. The advantage of the proposed transfer DAG learning is also supported by extensive numerical experiments on both synthetic data and multi-site brain functional connectivity network data.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Two Sides of The Same Coin: Bridging Deep Equilibrium Models and Neural ODEs via Homotopy Continuation
Authors:
Shutong Ding,
Tianyu Cui,
**gya Wang,
Ye Shi
Abstract:
Deep Equilibrium Models (DEQs) and Neural Ordinary Differential Equations (Neural ODEs) are two branches of implicit models that have achieved remarkable success owing to their superior performance and low memory consumption. While both are implicit models, DEQs and Neural ODEs are derived from different mathematical formulations. Inspired by homotopy continuation, we establish a connection betwee…
▽ More
Deep Equilibrium Models (DEQs) and Neural Ordinary Differential Equations (Neural ODEs) are two branches of implicit models that have achieved remarkable success owing to their superior performance and low memory consumption. While both are implicit models, DEQs and Neural ODEs are derived from different mathematical formulations. Inspired by homotopy continuation, we establish a connection between these two models and illustrate that they are actually two sides of the same coin. Homotopy continuation is a classical method of solving nonlinear equations based on a corresponding ODE. Given this connection, we proposed a new implicit model called HomoODE that inherits the property of high accuracy from DEQs and the property of stability from Neural ODEs. Unlike DEQs, which explicitly solve an equilibrium-point-finding problem via Newton's methods in the forward pass, HomoODE solves the equilibrium-point-finding problem implicitly using a modified Neural ODE via homotopy continuation. Further, we developed an acceleration method for HomoODE with a shared learnable initial point. It is worth noting that our model also provides a better understanding of why Augmented Neural ODEs work as long as the augmented part is regarded as the equilibrium point to find. Comprehensive experiments with several image classification tasks demonstrate that HomoODE surpasses existing implicit models in terms of both accuracy and memory consumption.
△ Less
Submitted 21 December, 2023; v1 submitted 14 October, 2023;
originally announced October 2023.
-
Change point detection in dynamic heterogeneous networks via subspace tracking
Authors:
Yuzhao Zhang,
**gnan Zhang,
Yifan Sun,
Junhui Wang
Abstract:
Dynamic networks consist of a sequence of time-varying networks, and it is of great importance to detect the network change points. Most existing methods focus on detecting abrupt change points, necessitating the assumption that the underlying network probability matrix remains constant between adjacent change points. This paper introduces a new model that allows the network probability matrix to…
▽ More
Dynamic networks consist of a sequence of time-varying networks, and it is of great importance to detect the network change points. Most existing methods focus on detecting abrupt change points, necessitating the assumption that the underlying network probability matrix remains constant between adjacent change points. This paper introduces a new model that allows the network probability matrix to undergo continuous shifting, while the latent network structure, represented via the embedding subspace, only changes at certain time points. Two novel statistics are proposed to jointly detect these network subspace change points, followed by a carefully refined detection procedure. Theoretically, we show that the proposed method is asymptotically consistent in terms of change point detection, and also establish the impossibility region for detecting these network subspace change points. The advantage of the proposed method is also supported by extensive numerical experiments on both synthetic networks and a UK politician social network.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
WASA: WAtermark-based Source Attribution for Large Language Model-Generated Data
Authors:
**gtan Wang,
Xinyang Lu,
Zitong Zhao,
Zhongxiang Dai,
Chuan-Sheng Foo,
See-Kiong Ng,
Bryan Kian Hsiang Low
Abstract:
The impressive performances of large language models (LLMs) and their immense potential for commercialization have given rise to serious concerns over the intellectual property (IP) of their training data. In particular, the synthetic texts generated by LLMs may infringe the IP of the data being used to train the LLMs. To this end, it is imperative to be able to (a) identify the data provider who…
▽ More
The impressive performances of large language models (LLMs) and their immense potential for commercialization have given rise to serious concerns over the intellectual property (IP) of their training data. In particular, the synthetic texts generated by LLMs may infringe the IP of the data being used to train the LLMs. To this end, it is imperative to be able to (a) identify the data provider who contributed to the generation of a synthetic text by an LLM (source attribution) and (b) verify whether the text data from a data provider has been used to train an LLM (data provenance). In this paper, we show that both problems can be solved by watermarking, i.e., by enabling an LLM to generate synthetic texts with embedded watermarks that contain information about their source(s). We identify the key properties of such watermarking frameworks (e.g., source attribution accuracy, robustness against adversaries), and propose a WAtermarking for Source Attribution (WASA) framework that satisfies these key properties due to our algorithmic designs. Our WASA framework enables an LLM to learn an accurate map** from the texts of different data providers to their corresponding unique watermarks, which sets the foundation for effective source attribution (and hence data provenance). Extensive empirical evaluations show that our WASA framework achieves effective source attribution and data provenance.
△ Less
Submitted 1 October, 2023;
originally announced October 2023.
-
Flexible Functional Treatment Effect Estimation
Authors:
Jiayi Wang,
Raymond K. W. Wong,
Xiaoke Zhang,
Kwun Chuen Gary Chan
Abstract:
We study treatment effect estimation with functional treatments where the average potential outcome functional is a function of functions, in contrast to continuous treatment effect estimation where the target is a function of real numbers. By considering a flexible scalar-on-function marginal structural model, a weight-modified kernel ridge regression (WMKRR) is adopted for estimation. The weight…
▽ More
We study treatment effect estimation with functional treatments where the average potential outcome functional is a function of functions, in contrast to continuous treatment effect estimation where the target is a function of real numbers. By considering a flexible scalar-on-function marginal structural model, a weight-modified kernel ridge regression (WMKRR) is adopted for estimation. The weights are constructed by directly minimizing the uniform balancing error resulting from a decomposition of the WMKRR estimator, instead of being estimated under a particular treatment selection model. Despite the complex structure of the uniform balancing error derived under WMKRR, finite-dimensional convex algorithms can be applied to efficiently solve for the proposed weights thanks to a representer theorem. The optimal convergence rate is shown to be attainable by the proposed WMKRR estimator without any smoothness assumption on the true weight function. Corresponding empirical performance is demonstrated by a simulation study and a real data application.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Unsupervised Contrast-Consistent Ranking with Language Models
Authors:
Niklas Stoehr,
Pengxiang Cheng,
**g Wang,
Daniel Preotiuc-Pietro,
Rajarshi Bhowmik
Abstract:
Language models contain ranking-based knowledge and are powerful solvers of in-context ranking tasks. For instance, they may have parametric knowledge about the ordering of countries by size or may be able to rank product reviews by sentiment. We compare pairwise, pointwise and listwise prompting techniques to elicit a language model's ranking knowledge. However, we find that even with careful cal…
▽ More
Language models contain ranking-based knowledge and are powerful solvers of in-context ranking tasks. For instance, they may have parametric knowledge about the ordering of countries by size or may be able to rank product reviews by sentiment. We compare pairwise, pointwise and listwise prompting techniques to elicit a language model's ranking knowledge. However, we find that even with careful calibration and constrained decoding, prompting-based techniques may not always be self-consistent in the rankings they produce. This motivates us to explore an alternative approach that is inspired by an unsupervised probing method called Contrast-Consistent Search (CCS). The idea is to train a probe guided by a logical constraint: a language model's representation of a statement and its negation must be mapped to contrastive true-false poles consistently across multiple statements. We hypothesize that similar constraints apply to ranking tasks where all items are related via consistent, pairwise or listwise comparisons. To this end, we extend the binary CCS method to Contrast-Consistent Ranking (CCR) by adapting existing ranking methods such as the Max-Margin Loss, Triplet Loss and an Ordinal Regression objective. Across different models and datasets, our results confirm that CCR probing performs better or, at least, on a par with prompting.
△ Less
Submitted 3 February, 2024; v1 submitted 13 September, 2023;
originally announced September 2023.
-
Winner's Curse Free Robust Mendelian Randomization with Summary Data
Authors:
Zhongming Xie,
Wanheng Zhang,
**gshen Wang,
Chong Wu
Abstract:
In the past decade, the increased availability of genome-wide association studies summary data has popularized Mendelian Randomization (MR) for conducting causal inference. MR analyses, incorporating genetic variants as instrumental variables, are known for their robustness against reverse causation bias and unmeasured confounders. Nevertheless, classical MR analyses utilizing summary data may sti…
▽ More
In the past decade, the increased availability of genome-wide association studies summary data has popularized Mendelian Randomization (MR) for conducting causal inference. MR analyses, incorporating genetic variants as instrumental variables, are known for their robustness against reverse causation bias and unmeasured confounders. Nevertheless, classical MR analyses utilizing summary data may still produce biased causal effect estimates due to the winner's curse and pleiotropic issues. To address these two issues and establish valid causal conclusions, we propose a unified robust Mendelian Randomization framework with summary data, which systematically removes the winner's curse and screens out invalid genetic instruments with pleiotropic effects. Different from existing robust MR literature, our framework delivers valid statistical inference on the causal effect neither requiring the genetic pleiotropy effects to follow any parametric distribution nor relying on perfect instrument screening property. Under appropriate conditions, we show that our proposed estimator converges to a normal distribution and its variance can be well estimated. We demonstrate the performance of our proposed estimator through Monte Carlo simulations and two case studies. The codes implementing the procedures are available at https://github.com/ChongWuLab/CARE/.
△ Less
Submitted 10 September, 2023;
originally announced September 2023.
-
Perceptual adjustment queries and an inverted measurement paradigm for low-rank metric learning
Authors:
Austin Xu,
Andrew D. McRae,
**gyan Wang,
Mark A. Davenport,
Ashwin Pananjady
Abstract:
We introduce a new type of query mechanism for collecting human feedback, called the perceptual adjustment query ( PAQ). Being both informative and cognitively lightweight, the PAQ adopts an inverted measurement scheme, and combines advantages from both cardinal and ordinal queries. We showcase the PAQ in the metric learning problem, where we collect PAQ measurements to learn an unknown Mahalanobi…
▽ More
We introduce a new type of query mechanism for collecting human feedback, called the perceptual adjustment query ( PAQ). Being both informative and cognitively lightweight, the PAQ adopts an inverted measurement scheme, and combines advantages from both cardinal and ordinal queries. We showcase the PAQ in the metric learning problem, where we collect PAQ measurements to learn an unknown Mahalanobis distance. This gives rise to a high-dimensional, low-rank matrix estimation problem to which standard matrix estimators cannot be applied. Consequently, we develop a two-stage estimator for metric learning from PAQs, and provide sample complexity guarantees for this estimator. We present numerical simulations demonstrating the performance of the estimator and its notable properties.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.