-
Diffusion Boosted Trees
Authors:
Xizewen Han,
Mingyuan Zhou
Abstract:
Combining the merits of both denoising diffusion probabilistic models and gradient boosting, the diffusion boosting paradigm is introduced for tackling supervised learning problems. We develop Diffusion Boosted Trees (DBT), which can be viewed as both a new denoising diffusion generative model parameterized by decision trees (one single tree for each diffusion timestep), and a new boosting algorit…
▽ More
Combining the merits of both denoising diffusion probabilistic models and gradient boosting, the diffusion boosting paradigm is introduced for tackling supervised learning problems. We develop Diffusion Boosted Trees (DBT), which can be viewed as both a new denoising diffusion generative model parameterized by decision trees (one single tree for each diffusion timestep), and a new boosting algorithm that combines the weak learners into a strong learner of conditional distributions without making explicit parametric assumptions on their density forms. We demonstrate through experiments the advantages of DBT over deep neural network-based diffusion models as well as the competence of DBT on real-world regression tasks, and present a business application (fraud detection) of DBT for classification on tabular data with the ability of learning to defer.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Towards Scalable Automated Alignment of LLMs: A Survey
Authors:
Boxi Cao,
Keming Lu,
Xinyu Lu,
Jiawei Chen,
Mengjie Ren,
Hao Xiang,
Peilin Liu,
Yaojie Lu,
Ben He,
Xianpei Han,
Le Sun,
Hongyu Lin,
Bowen Yu
Abstract:
Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approach…
▽ More
Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approaches. In this paper, we systematically review the recently emerging methods of automated alignment, attempting to explore how to achieve effective, scalable, automated alignment once the capabilities of LLMs exceed those of humans. Specifically, we categorize existing automated alignment methods into 4 major categories based on the sources of alignment signals and discuss the current status and potential development of each category. Additionally, we explore the underlying mechanisms that enable automated alignment and discuss the essential factors that make automated alignment technologies feasible and effective from the fundamental role of alignment.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Novel Node Category Detection Under Subpopulation Shift
Authors:
Hsing-Huan Chung,
Shravan Chaudhari,
Yoav Wald,
Xing Han,
Joydeep Ghosh
Abstract:
In real-world graph data, distribution shifts can manifest in various ways, such as the emergence of new categories and changes in the relative proportions of existing categories. It is often important to detect nodes of novel categories under such distribution shifts for safety or insight discovery purposes. We introduce a new approach, Recall-Constrained Optimization with Selective Link Predicti…
▽ More
In real-world graph data, distribution shifts can manifest in various ways, such as the emergence of new categories and changes in the relative proportions of existing categories. It is often important to detect nodes of novel categories under such distribution shifts for safety or insight discovery purposes. We introduce a new approach, Recall-Constrained Optimization with Selective Link Prediction (RECO-SLIP), to detect nodes belonging to novel categories in attributed graphs under subpopulation shifts. By integrating a recall-constrained learning framework with a sample-efficient link prediction mechanism, RECO-SLIP addresses the dual challenges of resilience against subpopulation shifts and the effective exploitation of graph structure. Our extensive empirical evaluation across multiple graph datasets demonstrates the superior performance of RECO-SLIP over existing methods. The experimental code is available at https://github.com/hsinghuan/novel-node-category-detection.
△ Less
Submitted 30 June, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
CATS: Enhancing Multivariate Time Series Forecasting by Constructing Auxiliary Time Series as Exogenous Variables
Authors:
Jiecheng Lu,
Xu Han,
Yan Sun,
Shihao Yang
Abstract:
For Multivariate Time Series Forecasting (MTSF), recent deep learning applications show that univariate models frequently outperform multivariate ones. To address the difficiency in multivariate models, we introduce a method to Construct Auxiliary Time Series (CATS) that functions like a 2D temporal-contextual attention mechanism, which generates Auxiliary Time Series (ATS) from Original Time Seri…
▽ More
For Multivariate Time Series Forecasting (MTSF), recent deep learning applications show that univariate models frequently outperform multivariate ones. To address the difficiency in multivariate models, we introduce a method to Construct Auxiliary Time Series (CATS) that functions like a 2D temporal-contextual attention mechanism, which generates Auxiliary Time Series (ATS) from Original Time Series (OTS) to effectively represent and incorporate inter-series relationships for forecasting. Key principles of ATS - continuity, sparsity, and variability - are identified and implemented through different modules. Even with a basic 2-layer MLP as core predictor, CATS achieves state-of-the-art, significantly reducing complexity and parameters compared to previous multivariate models, marking it an efficient and transferable MTSF solution.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
A Causal Framework to Evaluate Racial Bias in Law Enforcement Systems
Authors:
Jessy Xinyi Han,
Andrew Miller,
S. Craig Watkins,
Christopher Winship,
Fotini Christia,
Devavrat Shah
Abstract:
We are interested in develo** a data-driven method to evaluate race-induced biases in law enforcement systems. While the recent works have addressed this question in the context of police-civilian interactions using police stop data, they have two key limitations. First, bias can only be properly quantified if true criminality is accounted for in addition to race, but it is absent in prior works…
▽ More
We are interested in develo** a data-driven method to evaluate race-induced biases in law enforcement systems. While the recent works have addressed this question in the context of police-civilian interactions using police stop data, they have two key limitations. First, bias can only be properly quantified if true criminality is accounted for in addition to race, but it is absent in prior works. Second, law enforcement systems are multi-stage and hence it is important to isolate the true source of bias within the "causal chain of interactions" rather than simply focusing on the end outcome; this can help guide reforms. In this work, we address these challenges by presenting a multi-stage causal framework incorporating criminality. We provide a theoretical characterization and an associated data-driven method to evaluate (a) the presence of any form of racial bias, and (b) if so, the primary source of such a bias in terms of race and criminality. Our framework identifies three canonical scenarios with distinct characteristics: in settings like (1) airport security, the primary source of observed bias against a race is likely to be bias in law enforcement against innocents of that race; (2) AI-empowered policing, the primary source of observed bias against a race is likely to be bias in law enforcement against criminals of that race; and (3) police-civilian interaction, the primary source of observed bias against a race could be bias in law enforcement against that race or bias from the general public in reporting against the other race. Through an extensive empirical study using police-civilian interaction data and 911 call data, we find an instance of such a counter-intuitive phenomenon: in New Orleans, the observed bias is against the majority race and the likely reason for it is the over-reporting (via 911 calls) of incidents involving the minority race by the general public.
△ Less
Submitted 20 March, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
ARM: Refining Multivariate Forecasting with Adaptive Temporal-Contextual Learning
Authors:
Jiecheng Lu,
Xu Han,
Shihao Yang
Abstract:
Long-term time series forecasting (LTSF) is important for various domains but is confronted by challenges in handling the complex temporal-contextual relationships. As multivariate input models underperforming some recent univariate counterparts, we posit that the issue lies in the inefficiency of existing multivariate LTSF Transformers to model series-wise relationships: the characteristic differ…
▽ More
Long-term time series forecasting (LTSF) is important for various domains but is confronted by challenges in handling the complex temporal-contextual relationships. As multivariate input models underperforming some recent univariate counterparts, we posit that the issue lies in the inefficiency of existing multivariate LTSF Transformers to model series-wise relationships: the characteristic differences between series are often captured incorrectly. To address this, we introduce ARM: a multivariate temporal-contextual adaptive learning method, which is an enhanced architecture specifically designed for multivariate LTSF modelling. ARM employs Adaptive Univariate Effect Learning (AUEL), Random Drop** (RD) training strategy, and Multi-kernel Local Smoothing (MKLS), to better handle individual series temporal patterns and correctly learn inter-series dependencies. ARM demonstrates superior performance on multiple benchmarks without significantly increasing computational costs compared to vanilla Transformer, thereby advancing the state-of-the-art in LTSF. ARM is also generally applicable to other LTSF architecture beyond vanilla Transformer.
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
Transfer Learning for Bayesian Optimization on Heterogeneous Search Spaces
Authors:
Zhou Fan,
Xinran Han,
Zi Wang
Abstract:
Bayesian optimization (BO) is a popular black-box function optimization method, which makes sequential decisions based on a Bayesian model, typically a Gaussian process (GP), of the function. To ensure the quality of the model, transfer learning approaches have been developed to automatically design GP priors by learning from observations on "training" functions. These training functions are typic…
▽ More
Bayesian optimization (BO) is a popular black-box function optimization method, which makes sequential decisions based on a Bayesian model, typically a Gaussian process (GP), of the function. To ensure the quality of the model, transfer learning approaches have been developed to automatically design GP priors by learning from observations on "training" functions. These training functions are typically required to have the same domain as the "test" function (black-box function to be optimized). In this paper, we introduce MPHD, a model pre-training method on heterogeneous domains, which uses a neural net map** from domain-specific contexts to specifications of hierarchical GPs. MPHD can be seamlessly integrated with BO to transfer knowledge across heterogeneous search spaces. Our theoretical and empirical results demonstrate the validity of MPHD and its superior performance on challenging black-box function optimization tasks.
△ Less
Submitted 13 February, 2024; v1 submitted 28 September, 2023;
originally announced September 2023.
-
A Majorization-Minimization Gauss-Newton Method for 1-Bit Matrix Completion
Authors:
Xiaoqian Liu,
Xu Han,
Eric C. Chi,
Boaz Nadler
Abstract:
In 1-bit matrix completion, the aim is to estimate an underlying low-rank matrix from a partial set of binary observations. We propose a novel method for 1-bit matrix completion called MMGN. Our method is based on the majorization-minimization (MM) principle, which converts the original optimization problem into a sequence of standard low-rank matrix completion problems. We solve each of these sub…
▽ More
In 1-bit matrix completion, the aim is to estimate an underlying low-rank matrix from a partial set of binary observations. We propose a novel method for 1-bit matrix completion called MMGN. Our method is based on the majorization-minimization (MM) principle, which converts the original optimization problem into a sequence of standard low-rank matrix completion problems. We solve each of these sub-problems by a factorization approach that explicitly enforces the assumed low-rank structure and then apply a Gauss-Newton method. Using simulations and a real data example, we illustrate that in comparison to existing 1-bit matrix completion methods, MMGN outputs comparable if not more accurate estimates. In addition, it is often significantly faster, and less sensitive to the spikiness of the underlying matrix. In comparison with three standard generic optimization approaches that directly minimize the original objective, MMGN also exhibits a clear computational advantage, especially when the fraction of observed entries is small.
△ Less
Submitted 22 April, 2024; v1 submitted 26 April, 2023;
originally announced April 2023.
-
HyperBO+: Pre-training a universal prior for Bayesian optimization with hierarchical Gaussian processes
Authors:
Zhou Fan,
Xinran Han,
Zi Wang
Abstract:
Bayesian optimization (BO), while proved highly effective for many black-box function optimization tasks, requires practitioners to carefully select priors that well model their functions of interest. Rather than specifying by hand, researchers have investigated transfer learning based methods to automatically learn the priors, e.g. multi-task BO (Swersky et al., 2013), few-shot BO (Wistuba and Gr…
▽ More
Bayesian optimization (BO), while proved highly effective for many black-box function optimization tasks, requires practitioners to carefully select priors that well model their functions of interest. Rather than specifying by hand, researchers have investigated transfer learning based methods to automatically learn the priors, e.g. multi-task BO (Swersky et al., 2013), few-shot BO (Wistuba and Grabocka, 2021) and HyperBO (Wang et al., 2022). However, those prior learning methods typically assume that the input domains are the same for all tasks, weakening their ability to use observations on functions with different domains or generalize the learned priors to BO on different search spaces. In this work, we present HyperBO+: a pre-training approach for hierarchical Gaussian processes that enables the same prior to work universally for Bayesian optimization on functions with different domains. We propose a two-step pre-training method and analyze its appealing asymptotic properties and benefits to BO both theoretically and empirically. On real-world hyperparameter tuning tasks that involve multiple search spaces, we demonstrate that HyperBO+ is able to generalize to unseen search spaces and achieves lower regrets than competitive baselines.
△ Less
Submitted 28 September, 2023; v1 submitted 20 December, 2022;
originally announced December 2022.
-
Towards Accurate Subgraph Similarity Computation via Neural Graph Pruning
Authors:
Linfeng Liu,
Xu Han,
Dawei Zhou,
Li-** Liu
Abstract:
Subgraph similarity search, one of the core problems in graph search, concerns whether a target graph approximately contains a query graph. The problem is recently touched by neural methods. However, current neural methods do not consider pruning the target graph, though pruning is critically important in traditional calculations of subgraph similarities. One obstacle to applying pruning in neural…
▽ More
Subgraph similarity search, one of the core problems in graph search, concerns whether a target graph approximately contains a query graph. The problem is recently touched by neural methods. However, current neural methods do not consider pruning the target graph, though pruning is critically important in traditional calculations of subgraph similarities. One obstacle to applying pruning in neural methods is {the discrete property of pruning}. In this work, we convert graph pruning to a problem of node relabeling and then relax it to a differentiable problem. Based on this idea, we further design a novel neural network to approximate a type of subgraph distance: the subgraph edit distance (SED). {In particular, we construct the pruning component using a neural structure, and the entire model can be optimized end-to-end.} In the design of the model, we propose an attention mechanism to leverage the information about the query graph and guide the pruning of the target graph. Moreover, we develop a multi-head pruning strategy such that the model can better explore multiple ways of pruning the target graph. The proposed model establishes new state-of-the-art results across seven benchmark datasets. Extensive analysis of the model indicates that the proposed model can reasonably prune the target graph for SED computation. The implementation of our algorithm is released at our Github repo: https://github.com/tufts-ml/Prune4SED.
△ Less
Submitted 19 October, 2022;
originally announced October 2022.
-
Survival Mixture Density Networks
Authors:
Xintian Han,
Mark Goldstein,
Rajesh Ranganath
Abstract:
Survival analysis, the art of time-to-event modeling, plays an important role in clinical treatment decisions. Recently, continuous time models built from neural ODEs have been proposed for survival analysis. However, the training of neural ODEs is slow due to the high computational complexity of neural ODE solvers. Here, we propose an efficient alternative for flexible continuous time models, cal…
▽ More
Survival analysis, the art of time-to-event modeling, plays an important role in clinical treatment decisions. Recently, continuous time models built from neural ODEs have been proposed for survival analysis. However, the training of neural ODEs is slow due to the high computational complexity of neural ODE solvers. Here, we propose an efficient alternative for flexible continuous time models, called Survival Mixture Density Networks (Survival MDNs). Survival MDN applies an invertible positive function to the output of Mixture Density Networks (MDNs). While MDNs produce flexible real-valued distributions, the invertible positive function maps the model into the time-domain while preserving a tractable density. Using four datasets, we show that Survival MDN performs better than, or similarly to continuous and discrete time baselines on concordance, integrated Brier score and integrated binomial log-likelihood. Meanwhile, Survival MDNs are also faster than ODE-based models and circumvent binning issues in discrete models.
△ Less
Submitted 23 August, 2022;
originally announced August 2022.
-
Choquet regularization for reinforcement learning
Authors:
Xia Han,
Ruodu Wang,
Xun Yu Zhou
Abstract:
We propose \emph{Choquet regularizers} to measure and manage the level of exploration for reinforcement learning (RL), and reformulate the continuous-time entropy-regularized RL problem of Wang et al. (2020, JMLR, 21(198)) in which we replace the differential entropy used for regularization with a Choquet regularizer. We derive the Hamilton--Jacobi--Bellman equation of the problem, and solve it ex…
▽ More
We propose \emph{Choquet regularizers} to measure and manage the level of exploration for reinforcement learning (RL), and reformulate the continuous-time entropy-regularized RL problem of Wang et al. (2020, JMLR, 21(198)) in which we replace the differential entropy used for regularization with a Choquet regularizer. We derive the Hamilton--Jacobi--Bellman equation of the problem, and solve it explicitly in the linear--quadratic (LQ) case via maximizing statically a mean--variance constrained Choquet regularizer. Under the LQ setting, we derive explicit optimal distributions for several specific Choquet regularizers, and conversely identify the Choquet regularizers that generate a number of broadly used exploratory samplers such as $ε$-greedy, exponential, uniform and Gaussian.
△ Less
Submitted 17 August, 2022;
originally announced August 2022.
-
Split Localized Conformal Prediction
Authors:
Xing Han,
Ziyang Tang,
Joydeep Ghosh,
Qiang Liu
Abstract:
Conformal prediction is a simple and powerful tool that can quantify uncertainty without any distributional assumptions. Many existing methods only address the average coverage guarantee, which is not ideal compared to the stronger conditional coverage guarantee. Existing methods of approximating conditional coverage require additional models or time effort, which makes them not easy to scale. In…
▽ More
Conformal prediction is a simple and powerful tool that can quantify uncertainty without any distributional assumptions. Many existing methods only address the average coverage guarantee, which is not ideal compared to the stronger conditional coverage guarantee. Existing methods of approximating conditional coverage require additional models or time effort, which makes them not easy to scale. In this paper, we propose a modified non-conformity score by leveraging the local approximation of the conditional distribution using kernel density estimation. The modified score inherits the spirit of split conformal methods, which is simple and efficient and can scale to high dimensional settings. We also proposed a unified framework that brings together our method and several state-of-the-art. We perform extensive empirical evaluations: results measured by both average and conditional coverage confirm the advantage of our method.
△ Less
Submitted 20 February, 2023; v1 submitted 27 June, 2022;
originally announced June 2022.
-
CARD: Classification and Regression Diffusion Models
Authors:
Xizewen Han,
Huangjie Zheng,
Mingyuan Zhou
Abstract:
Learning the distribution of a continuous or categorical response variable $\boldsymbol y$ given its covariates $\boldsymbol x$ is a fundamental problem in statistics and machine learning. Deep neural network-based supervised learning algorithms have made great progress in predicting the mean of $\boldsymbol y$ given $\boldsymbol x$, but they are often criticized for their ability to accurately ca…
▽ More
Learning the distribution of a continuous or categorical response variable $\boldsymbol y$ given its covariates $\boldsymbol x$ is a fundamental problem in statistics and machine learning. Deep neural network-based supervised learning algorithms have made great progress in predicting the mean of $\boldsymbol y$ given $\boldsymbol x$, but they are often criticized for their ability to accurately capture the uncertainty of their predictions. In this paper, we introduce classification and regression diffusion (CARD) models, which combine a denoising diffusion-based conditional generative model and a pre-trained conditional mean estimator, to accurately predict the distribution of $\boldsymbol y$ given $\boldsymbol x$. We demonstrate the outstanding ability of CARD in conditional distribution prediction with both toy examples and real-world datasets, the experimental results on which show that CARD in general outperforms state-of-the-art methods, including Bayesian neural network-based ones that are designed for uncertainty estimation, especially when the conditional distribution of $\boldsymbol y$ given $\boldsymbol x$ is multi-modal. In addition, we utilize the stochastic nature of the generative model outputs to obtain a finer granularity in model confidence assessment at the instance level for classification tasks.
△ Less
Submitted 6 December, 2022; v1 submitted 14 June, 2022;
originally announced June 2022.
-
Quantum Kerr Learning
Authors:
Junyu Liu,
Changchun Zhong,
Matthew Otten,
Anirban Chandra,
Cristian L. Cortes,
Chaoyang Ti,
Stephen K Gray,
Xu Han
Abstract:
Quantum machine learning is a rapidly evolving field of research that could facilitate important applications for quantum computing and also significantly impact data-driven sciences. In our work, based on various arguments from complexity theory and physics, we demonstrate that a single Kerr mode can provide some "quantum enhancements" when dealing with kernel-based methods. Using kernel properti…
▽ More
Quantum machine learning is a rapidly evolving field of research that could facilitate important applications for quantum computing and also significantly impact data-driven sciences. In our work, based on various arguments from complexity theory and physics, we demonstrate that a single Kerr mode can provide some "quantum enhancements" when dealing with kernel-based methods. Using kernel properties, neural tangent kernel theory, first-order perturbation theory of the Kerr non-linearity, and non-perturbative numerical simulations, we show that quantum enhancements could happen in terms of convergence time and generalization error. Furthermore, we make explicit indications on how higher-dimensional input data could be considered. Finally, we propose an experimental protocol, that we call \emph{quantum Kerr learning}, based on circuit QED.
△ Less
Submitted 30 November, 2022; v1 submitted 20 May, 2022;
originally announced May 2022.
-
Inverse-Weighted Survival Games
Authors:
Xintian Han,
Mark Goldstein,
Aahlad Puli,
Thomas Wies,
Adler J Perotte,
Rajesh Ranganath
Abstract:
Deep models trained through maximum likelihood have achieved state-of-the-art results for survival analysis. Despite this training scheme, practitioners evaluate models under other criteria, such as binary classification losses at a chosen set of time horizons, e.g. Brier score (BS) and Bernoulli log likelihood (BLL). Models trained with maximum likelihood may have poor BS or BLL since maximum lik…
▽ More
Deep models trained through maximum likelihood have achieved state-of-the-art results for survival analysis. Despite this training scheme, practitioners evaluate models under other criteria, such as binary classification losses at a chosen set of time horizons, e.g. Brier score (BS) and Bernoulli log likelihood (BLL). Models trained with maximum likelihood may have poor BS or BLL since maximum likelihood does not directly optimize these criteria. Directly optimizing criteria like BS requires inverse-weighting by the censoring distribution. However, estimating the censoring model under these metrics requires inverse-weighting by the failure distribution. The objective for each model requires the other, but neither are known. To resolve this dilemma, we introduce Inverse-Weighted Survival Games. In these games, objectives for each model are built from re-weighted estimates featuring the other model, where the latter is held fixed during training. When the loss is proper, we show that the games always have the true failure and censoring distributions as a stationary point. This means models in the game do not leave the correct distributions once reached. We construct one case where this stationary point is unique. We show that these games optimize BS on simulations and then apply these principles on real world cancer and critically-ill patient data.
△ Less
Submitted 31 January, 2022; v1 submitted 15 November, 2021;
originally announced November 2021.
-
Scientists are Working Overtime and at the Weekends: Comparison of Publication Downloading from Copyrighted and Pirated Platforms
Authors:
Yu Geng,
Ren-Meng Cao,
Xiao-Pu Han,
Wen-Can Tian,
Guang-Yao Zhang,
Xian-Wen Wang
Abstract:
In this study, we track and analyze publication downloads from both copyrighted and pirated platforms to reconstruct scientists' activity patterns from a holistic perspective. Scientists around the world are working overtime, but scientists in different countries have different working patterns. Scientists' preferences for different platforms are influenced by a variety of factors such as working…
▽ More
In this study, we track and analyze publication downloads from both copyrighted and pirated platforms to reconstruct scientists' activity patterns from a holistic perspective. Scientists around the world are working overtime, but scientists in different countries have different working patterns. Scientists' preferences for different platforms are influenced by a variety of factors such as working times and workplace arrangements. There are variations by country in terms of whether scientists prefer to work overtime at night, at the weekend, or both at night and on the weekend. When scientists are working overtime, they prefer to use Sci-Hub rather than copyrighted platforms to access scholarly publications This may be because of the transition in their working scenarios as they move from the office to home outside of work hours.
△ Less
Submitted 4 November, 2021;
originally announced November 2021.
-
Large-Scale Multiple Testing for Matrix-Valued Data under Double Dependency
Authors:
Xu Han,
Sanat Sarkar,
Shiyu Zhang
Abstract:
High-dimensional inference based on matrix-valued data has drawn increasing attention in modern statistical research, yet not much progress has been made in large-scale multiple testing specifically designed for analysing such data sets. Motivated by this, we consider in this article an electroencephalography (EEG) experiment that produces matrix-valued data and presents a scope of develo** nove…
▽ More
High-dimensional inference based on matrix-valued data has drawn increasing attention in modern statistical research, yet not much progress has been made in large-scale multiple testing specifically designed for analysing such data sets. Motivated by this, we consider in this article an electroencephalography (EEG) experiment that produces matrix-valued data and presents a scope of develo** novel matrix-valued data based multiple testing methods controlling false discoveries for hypotheses that are of importance in such an experiment. The row-column cross-dependency of observations appearing in a matrix form, referred to as double-dependency, is one of the main challenges in the development of such methods. We address it by assuming matrix normal distribution for the observations at each of the independent matrix data-points. This allows us to fully capture the underlying double-dependency informed through the row- and column-covariance matrices and develop methods that are potentially more powerful than the corresponding one (e.g., Fan and Han (2017)) obtained by vectorizing each data point and thus ignoring the double-dependency. We propose two methods to approximate the false discovery proportion with statistical accuracy. While one of these methods is a general approach under double-dependency, the other one provides more computational efficiency for higher dimensionality. Extensive numerical studies illustrate the superior performance of the proposed methods over the principal factor approximation method of Fan and Han (2017). The proposed methods have been further applied to the aforementioned EEG data.
△ Less
Submitted 17 June, 2021;
originally announced June 2021.
-
Pre-processing with Orthogonal Decompositions for High-dimensional Explanatory Variables
Authors:
Xu Han,
Ethan X Fang,
Cheng Yong Tang
Abstract:
Strong correlations between explanatory variables are problematic for high-dimensional regularized regression methods. Due to the violation of the Irrepresentable Condition, the popular LASSO method may suffer from false inclusions of inactive variables. In this paper, we propose pre-processing with orthogonal decompositions (PROD) for the explanatory variables in high-dimensional regressions. The…
▽ More
Strong correlations between explanatory variables are problematic for high-dimensional regularized regression methods. Due to the violation of the Irrepresentable Condition, the popular LASSO method may suffer from false inclusions of inactive variables. In this paper, we propose pre-processing with orthogonal decompositions (PROD) for the explanatory variables in high-dimensional regressions. The PROD procedure is constructed based upon a generic orthogonal decomposition of the design matrix. We demonstrate by two concrete cases that the PROD approach can be effectively constructed for improving the performance of high-dimensional penalized regression. Our theoretical analysis reveals their properties and benefits for high-dimensional penalized linear regression with LASSO. Extensive numerical studies with simulations and data analysis show the promising performance of the PROD.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
Nonparametric Empirical Bayes Estimation and Testing for Sparse and Heteroscedastic Signals
Authors:
Junhui Cai,
Xu Han,
Ya'acov Ritov,
Linda Zhao
Abstract:
Large-scale modern data often involves estimation and testing for high-dimensional unknown parameters. It is desirable to identify the sparse signals, ``the needles in the haystack'', with accuracy and false discovery control. However, the unprecedented complexity and heterogeneity in modern data structure require new machine learning tools to effectively exploit commonalities and to robustly adju…
▽ More
Large-scale modern data often involves estimation and testing for high-dimensional unknown parameters. It is desirable to identify the sparse signals, ``the needles in the haystack'', with accuracy and false discovery control. However, the unprecedented complexity and heterogeneity in modern data structure require new machine learning tools to effectively exploit commonalities and to robustly adjust for both sparsity and heterogeneity. In addition, estimates for high-dimensional parameters often lack uncertainty quantification. In this paper, we propose a novel Spike-and-Nonparametric mixture prior (SNP) -- a spike to promote the sparsity and a nonparametric structure to capture signals. In contrast to the state-of-the-art methods, the proposed methods solve the estimation and testing problem at once with several merits: 1) an accurate sparsity estimation; 2) point estimates with shrinkage/soft-thresholding property; 3) credible intervals for uncertainty quantification; 4) an optimal multiple testing procedure that controls false discovery rate. Our method exhibits promising empirical performance on both simulated data and a gene expression case study.
△ Less
Submitted 5 November, 2021; v1 submitted 16 June, 2021;
originally announced June 2021.
-
Skilled Mutual Fund Selection: False Discovery Control under Dependence
Authors:
Lijia Wang,
Xu Han,
Xin Tong
Abstract:
Selecting skilled mutual funds through the multiple testing framework has received increasing attention from finance researchers and statisticians. The intercept $α$ of Carhart four-factor model is commonly used to measure the true performance of mutual funds, and positive $α$'s are considered as skilled. We observe that the standardized OLS estimates of $α$'s across the funds possess strong depen…
▽ More
Selecting skilled mutual funds through the multiple testing framework has received increasing attention from finance researchers and statisticians. The intercept $α$ of Carhart four-factor model is commonly used to measure the true performance of mutual funds, and positive $α$'s are considered as skilled. We observe that the standardized OLS estimates of $α$'s across the funds possess strong dependence and nonnormality structures, indicating that the conventional multiple testing methods are inadequate for selecting the skilled funds. We start from a decision theoretic perspective, and propose an optimal testing procedure to minimize a combination of false discovery rate and false non-discovery rate. Our proposed testing procedure is constructed based on the probability of each fund not being skilled conditional on the information across all of the funds in our study. To model the distribution of the information used for the testing procedure, we consider a mixture model under dependence and propose a new method called "approximate empirical Bayes" to fit the parameters. Empirical studies show that our selected skilled funds have superior long-term and short-term performance, e.g., our selection strongly outperforms the S\&P 500 index during the same period.
△ Less
Submitted 25 February, 2022; v1 submitted 15 June, 2021;
originally announced June 2021.
-
Order Matters: Probabilistic Modeling of Node Sequence for Graph Generation
Authors:
Xiaohui Chen,
Xu Han,
Jia**g Hu,
Francisco J. R. Ruiz,
Li** Liu
Abstract:
A graph generative model defines a distribution over graphs. One type of generative model is constructed by autoregressive neural networks, which sequentially add nodes and edges to generate a graph. However, the likelihood of a graph under the autoregressive model is intractable, as there are numerous sequences leading to the given graph; this makes maximum likelihood estimation challenging. Inst…
▽ More
A graph generative model defines a distribution over graphs. One type of generative model is constructed by autoregressive neural networks, which sequentially add nodes and edges to generate a graph. However, the likelihood of a graph under the autoregressive model is intractable, as there are numerous sequences leading to the given graph; this makes maximum likelihood estimation challenging. Instead, in this work we derive the exact joint probability over the graph and the node ordering of the sequential process. From the joint, we approximately marginalize out the node orderings and compute a lower bound on the log-likelihood using variational inference. We train graph generative models by maximizing this bound, without using the ad-hoc node orderings of previous methods. Our experiments show that the log-likelihood bound is significantly tighter than the bound of previous schemes. Moreover, the models fitted with the proposed algorithm can generate high-quality graphs that match the structures of target graphs not seen during training. We have made our code publicly available at \hyperref[https://github.com/tufts-ml/graph-generation-vi]{https://github.com/tufts-ml/graph-generation-vi}.
△ Less
Submitted 14 June, 2021; v1 submitted 11 June, 2021;
originally announced June 2021.
-
Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path
Authors:
X. Y. Han,
Vardan Papyan,
David L. Donoho
Abstract:
The recently discovered Neural Collapse (NC) phenomenon occurs pervasively in today's deep net training paradigm of driving cross-entropy (CE) loss towards zero. During NC, last-layer features collapse to their class-means, both classifiers and class-means collapse to the same Simplex Equiangular Tight Frame, and classifier behavior collapses to the nearest-class-mean decision rule. Recent works d…
▽ More
The recently discovered Neural Collapse (NC) phenomenon occurs pervasively in today's deep net training paradigm of driving cross-entropy (CE) loss towards zero. During NC, last-layer features collapse to their class-means, both classifiers and class-means collapse to the same Simplex Equiangular Tight Frame, and classifier behavior collapses to the nearest-class-mean decision rule. Recent works demonstrated that deep nets trained with mean squared error (MSE) loss perform comparably to those trained with CE. As a preliminary, we empirically establish that NC emerges in such MSE-trained deep nets as well through experiments on three canonical networks and five benchmark datasets. We provide, in a Google Colab notebook, PyTorch code for reproducing MSE-NC and CE-NC: at https://colab.research.google.com/github/neuralcollapse/neuralcollapse/blob/main/neuralcollapse.ipynb. The analytically-tractable MSE loss offers more mathematical opportunities than the hard-to-analyze CE loss, inspiring us to leverage MSE loss towards the theoretical investigation of NC. We develop three main contributions: (I) We show a new decomposition of the MSE loss into (A) terms directly interpretable through the lens of NC and which assume the last-layer classifier is exactly the least-squares classifier; and (B) a term capturing the deviation from this least-squares classifier. (II) We exhibit experiments on canonical datasets and networks demonstrating that term-(B) is negligible during training. This motivates us to introduce a new theoretical construct: the central path, where the linear classifier stays MSE-optimal for feature activations throughout the dynamics. (III) By studying renormalized gradient flow along the central path, we derive exact dynamics that predict NC.
△ Less
Submitted 9 May, 2022; v1 submitted 3 June, 2021;
originally announced June 2021.
-
X-CAL: Explicit Calibration for Survival Analysis
Authors:
Mark Goldstein,
Xintian Han,
Aahlad Puli,
Adler J. Perotte,
Rajesh Ranganath
Abstract:
Survival analysis models the distribution of time until an event of interest, such as discharge from the hospital or admission to the ICU. When a model's predicted number of events within any time interval is similar to the observed number, it is called well-calibrated. A survival model's calibration can be measured using, for instance, distributional calibration (D-CALIBRATION) [Haider et al., 20…
▽ More
Survival analysis models the distribution of time until an event of interest, such as discharge from the hospital or admission to the ICU. When a model's predicted number of events within any time interval is similar to the observed number, it is called well-calibrated. A survival model's calibration can be measured using, for instance, distributional calibration (D-CALIBRATION) [Haider et al., 2020] which computes the squared difference between the observed and predicted number of events within different time intervals. Classically, calibration is addressed in post-training analysis. We develop explicit calibration (X-CAL), which turns D-CALIBRATION into a differentiable objective that can be used in survival modeling alongside maximum likelihood estimation and other objectives. X-CAL allows practitioners to directly optimize calibration and strike a desired balance between predictive power and calibration. In our experiments, we fit a variety of shallow and deep models on simulated data, a survival dataset based on MNIST, on length-of-stay prediction using MIMIC-III data, and on brain cancer data from The Cancer Genome Atlas. We show that the models we study can be miscalibrated. We give experimental evidence on these datasets that X-CAL improves D-CALIBRATION without a large decrease in concordance or likelihood.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
Individual-centered partial information in social networks
Authors:
Xiao Han,
Y. X. Rachel Wang,
Qing Yang,
Xin Tong
Abstract:
In statistical network analysis, we often assume either the full network is available or multiple subgraphs can be sampled to estimate various global properties of the network. However, in a real social network, people frequently make decisions based on their local view of the network alone. Here, we consider a partial information framework that characterizes the local network centered at a given…
▽ More
In statistical network analysis, we often assume either the full network is available or multiple subgraphs can be sampled to estimate various global properties of the network. However, in a real social network, people frequently make decisions based on their local view of the network alone. Here, we consider a partial information framework that characterizes the local network centered at a given individual by path length $L$ and gives rise to a partial adjacency matrix. Under $L=2$, we focus on the problem of (global) community detection using the popular stochastic block model (SBM) and its degree-corrected variant (DCSBM). We derive theoretical properties of the eigenvalues and eigenvectors from the signal term of the partial adjacency matrix and propose new spectral-based community detection algorithms that achieve consistency under appropriate conditions. Our analysis also allows us to propose a new centrality measure that assesses the importance of an individual's partial information in determining global community structure. Using simulated and real networks, we demonstrate the performance of our algorithms and compare our centrality measure with other popular alternatives to show it captures unique nodal information. Our results illustrate that the partial information framework enables us to compare the viewpoints of different individuals regarding the global structure.
△ Less
Submitted 2 July, 2024; v1 submitted 1 October, 2020;
originally announced October 2020.
-
Prevalence of Neural Collapse during the terminal phase of deep learning training
Authors:
Vardan Papyan,
X. Y. Han,
David L. Donoho
Abstract:
Modern practice for training classification deepnets involves a Terminal Phase of Training (TPT), which begins at the epoch where training error first vanishes; During TPT, the training error stays effectively zero while training loss is pushed towards zero. Direct measurements of TPT, for three prototypical deepnet architectures and across seven canonical classification datasets, expose a pervasi…
▽ More
Modern practice for training classification deepnets involves a Terminal Phase of Training (TPT), which begins at the epoch where training error first vanishes; During TPT, the training error stays effectively zero while training loss is pushed towards zero. Direct measurements of TPT, for three prototypical deepnet architectures and across seven canonical classification datasets, expose a pervasive inductive bias we call Neural Collapse, involving four deeply interconnected phenomena: (NC1) Cross-example within-class variability of last-layer training activations collapses to zero, as the individual activations themselves collapse to their class-means; (NC2) The class-means collapse to the vertices of a Simplex Equiangular Tight Frame (ETF); (NC3) Up to rescaling, the last-layer classifiers collapse to the class-means, or in other words to the Simplex ETF, i.e. to a self-dual configuration; (NC4) For a given activation, the classifier's decision collapses to simply choosing whichever class has the closest train class-mean, i.e. the Nearest Class Center (NCC) decision rule. The symmetric and very simple geometry induced by the TPT confers important benefits, including better generalization performance, better robustness, and better interpretability.
△ Less
Submitted 21 August, 2020; v1 submitted 18 August, 2020;
originally announced August 2020.
-
AutoRec: An Automated Recommender System
Authors:
Ting-Hsiang Wang,
Qingquan Song,
Xiaotian Han,
Zirui Liu,
Haifeng **,
Xia Hu
Abstract:
Realistic recommender systems are often required to adapt to ever-changing data and tasks or to explore different models systematically. To address the need, we present AutoRec, an open-source automated machine learning (AutoML) platform extended from the TensorFlow ecosystem and, to our knowledge, the first framework to leverage AutoML for model search and hyperparameter tuning in deep recommenda…
▽ More
Realistic recommender systems are often required to adapt to ever-changing data and tasks or to explore different models systematically. To address the need, we present AutoRec, an open-source automated machine learning (AutoML) platform extended from the TensorFlow ecosystem and, to our knowledge, the first framework to leverage AutoML for model search and hyperparameter tuning in deep recommendation models. AutoRec also supports a highly flexible pipeline that accommodates both sparse and dense inputs, rating prediction and click-through rate (CTR) prediction tasks, and an array of recommendation models. Lastly, AutoRec provides a simple, user-friendly API. Experiments conducted on the benchmark datasets reveal AutoRec is reliable and can identify models which resemble the best model without prior knowledge.
△ Less
Submitted 26 June, 2020;
originally announced July 2020.
-
Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach
Authors:
Carlos Fernández-Loría,
Foster Provost,
Xintian Han
Abstract:
We examine counterfactual explanations for explaining the decisions made by model-based AI systems. The counterfactual approach we consider defines an explanation as a set of the system's data inputs that causally drives the decision (i.e., changing the inputs in the set changes the decision) and is irreducible (i.e., changing any subset of the inputs does not change the decision). We (1) demonstr…
▽ More
We examine counterfactual explanations for explaining the decisions made by model-based AI systems. The counterfactual approach we consider defines an explanation as a set of the system's data inputs that causally drives the decision (i.e., changing the inputs in the set changes the decision) and is irreducible (i.e., changing any subset of the inputs does not change the decision). We (1) demonstrate how this framework may be used to provide explanations for decisions made by general, data-driven AI systems that may incorporate features with arbitrary data types and multiple predictive models, and (2) propose a heuristic procedure to find the most useful explanations depending on the context. We then contrast counterfactual explanations with methods that explain model predictions by weighting features according to their importance (e.g., SHAP, LIME) and present two fundamental reasons why we should carefully consider whether importance-weight explanations are well-suited to explain system decisions. Specifically, we show that (i) features that have a large importance weight for a model prediction may not affect the corresponding decision, and (ii) importance weights are insufficient to communicate whether and how features influence decisions. We demonstrate this with several concise examples and three detailed case studies that compare the counterfactual approach with SHAP to illustrate various conditions under which counterfactual explanations explain data-driven decisions better than importance weights.
△ Less
Submitted 13 October, 2021; v1 submitted 21 January, 2020;
originally announced January 2020.
-
Semi-Supervised Deep Learning Using Improved Unsupervised Discriminant Projection
Authors:
Xiao Han,
Zihao Wang,
Enmei Tu,
Gunnam Suryanarayana,
Jie Yang
Abstract:
Deep learning demands a huge amount of well-labeled data to train the network parameters. How to use the least amount of labeled data to obtain the desired classification accuracy is of great practical significance, because for many real-world applications (such as medical diagnosis), it is difficult to obtain so many labeled samples. In this paper, modify the unsupervised discriminant projection…
▽ More
Deep learning demands a huge amount of well-labeled data to train the network parameters. How to use the least amount of labeled data to obtain the desired classification accuracy is of great practical significance, because for many real-world applications (such as medical diagnosis), it is difficult to obtain so many labeled samples. In this paper, modify the unsupervised discriminant projection algorithm from dimension reduction and apply it as a regularization term to propose a new semi-supervised deep learning algorithm, which is able to utilize both the local and nonlocal distribution of abundant unlabeled samples to improve classification performance. Experiments show that given dozens of labeled samples, the proposed algorithm can train a deep network to attain satisfactory classification results.
△ Less
Submitted 19 December, 2019;
originally announced December 2019.
-
Nonparametric Screening under Conditional Strictly Convex Loss for Ultrahigh Dimensional Sparse Data
Authors:
Xu Han
Abstract:
Sure screening technique has been considered as a powerful tool to handle the ultrahigh dimensional variable selection problems, where the dimensionality p and the sample size n can satisfy the NP dimensionality log p=O(n^a) for some a>0 (Fan & Lv 2008). The current paper aims to simultaneously tackle the "universality" and "effectiveness" of sure screening procedures. For the "universality", we d…
▽ More
Sure screening technique has been considered as a powerful tool to handle the ultrahigh dimensional variable selection problems, where the dimensionality p and the sample size n can satisfy the NP dimensionality log p=O(n^a) for some a>0 (Fan & Lv 2008). The current paper aims to simultaneously tackle the "universality" and "effectiveness" of sure screening procedures. For the "universality", we develop a general and unified framework for nonparametric screening methods from a loss function perspective. Consider a loss function to measure the divergence of the response variable and the underlying nonparametric function of covariates. We newly propose a class of loss functions called conditional strictly convex loss, which contains, but is not limited to, negative log likelihood loss from one-parameter exponential families, exponential loss for binary classification and quantile regression loss. The sure screening property and model selection size control will be established within this class of loss functions. For the ``effectiveness", we focus on a goodness of fit nonparametric screening (Goffins) method under conditional strictly convex loss. Interestingly, we can achieve a better convergence probability of containing the true model compared with related literature. The superior performance of our proposed method has been further demonstrated by extensive simulation studies and some real scientific data example.
△ Less
Submitted 2 December, 2019;
originally announced December 2019.
-
Re-Evaluating Strengthened-IV Designs: Asymptotic Efficiency, Bias Formula, and the Validity and Power of Sensitivity Analyses
Authors:
Siyu Heng,
Bo Zhang,
Xu Han,
Scott A. Lorch,
Dylan S. Small
Abstract:
Instrumental variables (IVs) are extensively used to estimate treatment effects when the treatment and outcome are confounded by unmeasured confounders; however, weak IVs are often encountered in empirical studies and may cause problems. Many studies have considered building a stronger IV from the original, possibly weak, IV in the design stage of a matched study at the cost of not using some of t…
▽ More
Instrumental variables (IVs) are extensively used to estimate treatment effects when the treatment and outcome are confounded by unmeasured confounders; however, weak IVs are often encountered in empirical studies and may cause problems. Many studies have considered building a stronger IV from the original, possibly weak, IV in the design stage of a matched study at the cost of not using some of the samples in the analysis. It is widely accepted that strengthening an IV tends to render nonparametric tests more powerful and will increase the power of sensitivity analyses in large samples. In this article, we re-evaluate this conventional wisdom to bring new insights into this topic. We consider matched observational studies from three perspectives. First, we evaluate the trade-off between IV strength and sample size on nonparametric tests assuming the IV is valid and exhibit conditions under which strengthening an IV increases power and conversely conditions under which it decreases power. Second, we derive a necessary condition for a valid sensitivity analysis model with continuous doses. We show that the $Γ$ sensitivity analysis model, which has been previously used to come to the conclusion that strengthening an IV increases the power of sensitivity analyses in large samples, does not apply to the continuous IV setting and thus this previously reached conclusion may be invalid. Third, we quantify the bias of the Wald estimator with a possibly invalid IV under an oracle and leverage it to develop a valid sensitivity analysis framework; under this framework, we show that strengthening an IV may amplify or mitigate the bias of the estimator, and may or may not increase the power of sensitivity analyses. We also discuss how to better adjust for the observed covariates when building an IV in matched studies.
△ Less
Submitted 15 October, 2021; v1 submitted 20 November, 2019;
originally announced November 2019.
-
SIMPLE: Statistical Inference on Membership Profiles in Large Networks
Authors:
Jianqing Fan,
Yingying Fan,
Xiao Han,
**chi Lv
Abstract:
Network data is prevalent in many contemporary big data applications in which a common interest is to unveil important latent links between different pairs of nodes. Yet a simple fundamental question of how to precisely quantify the statistical uncertainty associated with the identification of latent links still remains largely unexplored. In this paper, we propose the method of statistical infere…
▽ More
Network data is prevalent in many contemporary big data applications in which a common interest is to unveil important latent links between different pairs of nodes. Yet a simple fundamental question of how to precisely quantify the statistical uncertainty associated with the identification of latent links still remains largely unexplored. In this paper, we propose the method of statistical inference on membership profiles in large networks (SIMPLE) in the setting of degree-corrected mixed membership model, where the null hypothesis assumes that the pair of nodes share the same profile of community memberships. In the simpler case of no degree heterogeneity, the model reduces to the mixed membership model for which an alternative more robust test is also proposed. Both tests are of the Hotelling-type statistics based on the rows of empirical eigenvectors or their ratios, whose asymptotic covariance matrices are very challenging to derive and estimate. Nevertheless, their analytical expressions are unveiled and the unknown covariance matrices are consistently estimated. Under some mild regularity conditions, we establish the exact limiting distributions of the two forms of SIMPLE test statistics under the null hypothesis and contiguous alternative hypothesis. They are the chi-square distributions and the noncentral chi-square distributions, respectively, with degrees of freedom depending on whether the degrees are corrected or not. We also address the important issue of estimating the unknown number of communities and establish the asymptotic properties of the associated test statistics. The advantages and practical utility of our new procedures in terms of both size and power are demonstrated through several simulation examples and real network applications.
△ Less
Submitted 29 August, 2021; v1 submitted 3 October, 2019;
originally announced October 2019.
-
GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification
Authors:
Jie Zhou,
Xu Han,
Cheng Yang,
Zhiyuan Liu,
Lifeng Wang,
Changcheng Li,
Maosong Sun
Abstract:
Fact verification (FV) is a challenging task which requires to retrieve relevant evidence from plain text and use the evidence to verify given claims. Many claims require to simultaneously integrate and reason over several pieces of evidence for verification. However, previous work employs simple models to extract information from evidence without letting evidence communicate with each other, e.g.…
▽ More
Fact verification (FV) is a challenging task which requires to retrieve relevant evidence from plain text and use the evidence to verify given claims. Many claims require to simultaneously integrate and reason over several pieces of evidence for verification. However, previous work employs simple models to extract information from evidence without letting evidence communicate with each other, e.g., merely concatenate the evidence for processing. Therefore, these methods are unable to grasp sufficient relational and logical information among the evidence. To alleviate this issue, we propose a graph-based evidence aggregating and reasoning (GEAR) framework which enables information to transfer on a fully-connected evidence graph and then utilizes different aggregators to collect multi-evidence information. We further employ BERT, an effective pre-trained language representation model, to improve the performance. Experimental results on a large-scale benchmark dataset FEVER have demonstrated that GEAR could leverage multi-evidence information for FV and thus achieves the promising result with a test FEVER score of 67.10%. Our code is available at https://github.com/thunlp/GEAR.
△ Less
Submitted 22 July, 2019;
originally announced August 2019.
-
Quantifying Similarity between Relations with Fact Distribution
Authors:
Weize Chen,
Hao Zhu,
Xu Han,
Zhiyuan Liu,
Maosong Sun
Abstract:
We introduce a conceptually simple and effective method to quantify the similarity between relations in knowledge bases. Specifically, our approach is based on the divergence between the conditional probability distributions over entity pairs. In this paper, these distributions are parameterized by a very simple neural network. Although computing the exact similarity is in-tractable, we provide a…
▽ More
We introduce a conceptually simple and effective method to quantify the similarity between relations in knowledge bases. Specifically, our approach is based on the divergence between the conditional probability distributions over entity pairs. In this paper, these distributions are parameterized by a very simple neural network. Although computing the exact similarity is in-tractable, we provide a sampling-based method to get a good approximation. We empirically show the outputs of our approach significantly correlate with human judgments. By applying our method to various tasks, we also find that (1) our approach could effectively detect redundant relations extracted by open information extraction (Open IE) models, that (2) even the most competitive models for relational classification still make mistakes among very similar relations, and that (3) our approach could be incorporated into negative sampling and softmax classification to alleviate these mistakes. The source code and experiment details of this paper can be obtained from https://github.com/thunlp/relation-similarity.
△ Less
Submitted 21 July, 2019;
originally announced July 2019.
-
Adversarial Examples for Electrocardiograms
Authors:
Xintian Han,
Yuxuan Hu,
Luca Foschini,
Larry Chinitz,
Lior Jankelson,
Rajesh Ranganath
Abstract:
In recent years, the electrocardiogram (ECG) has seen a large diffusion in both medical and commercial applications, fueled by the rise of single-lead versions. Single-lead ECG can be embedded in medical devices and wearable products such as the injectable Medtronic Linq monitor, the iRhythm Ziopatch wearable monitor, and the Apple Watch Series 4. Recently, deep neural networks have been used to a…
▽ More
In recent years, the electrocardiogram (ECG) has seen a large diffusion in both medical and commercial applications, fueled by the rise of single-lead versions. Single-lead ECG can be embedded in medical devices and wearable products such as the injectable Medtronic Linq monitor, the iRhythm Ziopatch wearable monitor, and the Apple Watch Series 4. Recently, deep neural networks have been used to automatically analyze ECG tracings, outperforming even physicians specialized in cardiac electrophysiology in detecting certain rhythm irregularities. However, deep learning classifiers have been shown to be brittle to adversarial examples, which are examples created to look incontrovertibly belonging to a certain class to a human eye but contain subtle features that fool the classifier into misclassifying them into the wrong class. Very recently, adversarial examples have also been created for medical-related tasks. Yet, traditional attack methods to create adversarial examples, such as projected gradient descent (PGD) do not extend directly to ECG signals, as they generate examples that introduce square wave artifacts that are not physiologically plausible. Here, we developed a method to construct smoothed adversarial examples for single-lead ECG. First, we implemented a neural network model achieving state-of-the-art performance on the data from the 2017 PhysioNet/Computing-in-Cardiology Challenge for arrhythmia detection from single lead ECG classification. For this model, we utilized a new technique to generate smoothed examples to produce signals that are 1) indistinguishable to cardiologists from the original examples and 2) incorrectly classified by the neural network. Finally, we show that adversarial examples are not unique and provide a general technique to collate and perturb known adversarial examples to create new ones.
△ Less
Submitted 4 June, 2019; v1 submitted 13 May, 2019;
originally announced May 2019.
-
Kernelized Complete Conditional Stein Discrepancy
Authors:
Raghav Singhal,
Xintian Han,
Saad Lahlou,
Rajesh Ranganath
Abstract:
Much of machine learning relies on comparing distributions with discrepancy measures. Stein's method creates discrepancy measures between two distributions that require only the unnormalized density of one and samples from the other. Stein discrepancies can be combined with kernels to define kernelized Stein discrepancies (KSDs). While kernels make Stein discrepancies tractable, they pose several…
▽ More
Much of machine learning relies on comparing distributions with discrepancy measures. Stein's method creates discrepancy measures between two distributions that require only the unnormalized density of one and samples from the other. Stein discrepancies can be combined with kernels to define kernelized Stein discrepancies (KSDs). While kernels make Stein discrepancies tractable, they pose several challenges in high dimensions. We introduce kernelized complete conditional Stein discrepancies (KCC-SDs). Complete conditionals turn a multivariate distribution into multiple univariate distributions. We show that KCC-SDs distinguish distributions. To show the efficacy of KCC-SDs in distinguishing distributions, we introduce a goodness-of-fit test using KCC-SDs. We empirically show that KCC-SDs have higher power over baselines and use KCC-SDs to assess sample quality in Markov chain Monte Carlo.
△ Less
Submitted 17 July, 2020; v1 submitted 9 April, 2019;
originally announced April 2019.
-
FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation
Authors:
Xu Han,
Hao Zhu,
Pengfei Yu,
Ziyun Wang,
Yuan Yao,
Zhiyuan Liu,
Maosong Sun
Abstract:
We present a Few-Shot Relation Classification Dataset (FewRel), consisting of 70, 000 sentences on 100 relations derived from Wikipedia and annotated by crowdworkers. The relation of each sentence is first recognized by distant supervision methods, and then filtered by crowdworkers. We adapt the most recent state-of-the-art few-shot learning methods for relation classification and conduct a thorou…
▽ More
We present a Few-Shot Relation Classification Dataset (FewRel), consisting of 70, 000 sentences on 100 relations derived from Wikipedia and annotated by crowdworkers. The relation of each sentence is first recognized by distant supervision methods, and then filtered by crowdworkers. We adapt the most recent state-of-the-art few-shot learning methods for relation classification and conduct a thorough evaluation of these methods. Empirical results show that even the most competitive few-shot learning models struggle on this task, especially as compared with humans. We also show that a range of different reasoning skills are needed to solve our task. These results indicate that few-shot relation classification remains an open problem and still requires further research. Our detailed analysis points multiple directions for future research. All details and resources about the dataset and baselines are released on http://zhuhao.me/fewrel.
△ Less
Submitted 26 October, 2018; v1 submitted 23 October, 2018;
originally announced October 2018.
-
Feature Selection and Model Comparison on Microsoft Learning-to-Rank Data Sets
Authors:
Xinzhi Han,
Sen Lei
Abstract:
With the rapid advance of the Internet, search engines (e.g., Google, Bing, Yahoo!) are used by billions of users for each day. The main function of a search engine is to locate the most relevant webpages corresponding to what the user requests. This report focuses on the core problem of information retrieval: how to learn the relevance between a document (very often webpage) and a query given by…
▽ More
With the rapid advance of the Internet, search engines (e.g., Google, Bing, Yahoo!) are used by billions of users for each day. The main function of a search engine is to locate the most relevant webpages corresponding to what the user requests. This report focuses on the core problem of information retrieval: how to learn the relevance between a document (very often webpage) and a query given by user. Our analysis consists of two parts: 1) we use standard statistical methods to select important features among 137 candidates given by information retrieval researchers from Microsoft. We find that not all the features are useful, and give interpretations on the top-selected features; 2) we give baselines on prediction over the real-world dataset MSLR-WEB by using various learning algorithms. We find that models of boosting trees, random forest in general achieve the best performance of prediction. This agrees with the mainstream opinion in information retrieval community that tree-based algorithms outperform the other candidates for this problem.
△ Less
Submitted 13 March, 2018;
originally announced March 2018.
-
Triangle-map** Analysis on Spatial Competition and Cooperation of Chinese Cities
Authors:
Pan Liu,
Xiao-Pu Han,
Linyuan Lü
Abstract:
In this paper, we empirically analyze the spatial distribution of Chinese cities using a method based on triangle transition. This method uses a regular triangle map** from the observed cities and its three neighboring cities to analyze their distribution of map** positions. We find that obvious center-gathering tendency for the relationship between cities and its nearest three cities, indicat…
▽ More
In this paper, we empirically analyze the spatial distribution of Chinese cities using a method based on triangle transition. This method uses a regular triangle map** from the observed cities and its three neighboring cities to analyze their distribution of map** positions. We find that obvious center-gathering tendency for the relationship between cities and its nearest three cities, indicating the spatial competition between cities. Moreover, we observed the competitive trends between neighboring cities with similar economic volume, and the remarkable cooperative tendency between neighboring cities with large difference on economy. The threshold of the ratio of the two cities' economic volume on the transition from competition to cooperation is about 1.2. These findings are helpful in the understanding of the cities economic relationship, especially in the study of competition and cooperation between cities.
△ Less
Submitted 2 January, 2018;
originally announced January 2018.
-
A note on quickly sampling a sparse matrix with low rank expectation
Authors:
Karl Rohe,
Jun Tao,
Xintian Han,
Norbert Binkiewicz
Abstract:
Given matrices $X,Y \in R^{n \times K}$ and $S \in R^{K \times K}$ with positive elements, this paper proposes an algorithm fastRG to sample a sparse matrix $A$ with low rank expectation $E(A) = XSY^T$ and independent Poisson elements. This allows for quickly sampling from a broad class of stochastic blockmodel graphs (degree-corrected, mixed membership, overlap**) all of which are specific para…
▽ More
Given matrices $X,Y \in R^{n \times K}$ and $S \in R^{K \times K}$ with positive elements, this paper proposes an algorithm fastRG to sample a sparse matrix $A$ with low rank expectation $E(A) = XSY^T$ and independent Poisson elements. This allows for quickly sampling from a broad class of stochastic blockmodel graphs (degree-corrected, mixed membership, overlap**) all of which are specific parameterizations of the generalized random product graph model defined in Section 2.2. The basic idea of fastRG is to first sample the number of edges $m$ and then sample each edge. The key insight is that because of the the low rank expectation, it is easy to sample individual edges. The naive "element-wise" algorithm requires $O(n^2)$ operations to generate the $n\times n$ adjacency matrix $A$. In sparse graphs, where $m = O(n)$, ignoring log terms, fastRG runs in time $O(n)$. An implementation in fastRG is available on github. A computational experiment in Section 2.4 simulates graphs up to $n=10,000,000$ nodes with $m = 100,000,000$ edges. For example, on a graph with $n=500,000$ and $m = 5,000,000$, fastRG runs in less than one second on a 3.5 GHz Intel i5.
△ Less
Submitted 8 March, 2017;
originally announced March 2017.
-
Interval Estimation for Conditional Failure Rates of Transmission Lines with Limited Samples
Authors:
Ming Yang,
Jianhui Wang,
Haoran Diao,
Junjian Qi,
Xueshan Han
Abstract:
The estimation of the conditional failure rate (CFR) of an overhead transmission line (OTL) is essential for power system operational reliability assessment. It is hard to predict the CFR precisely, although great efforts have been made to improve the estimation accuracy. One significant difficulty is the lack of available outage samples, due to which the law of large numbers is no longer applicab…
▽ More
The estimation of the conditional failure rate (CFR) of an overhead transmission line (OTL) is essential for power system operational reliability assessment. It is hard to predict the CFR precisely, although great efforts have been made to improve the estimation accuracy. One significant difficulty is the lack of available outage samples, due to which the law of large numbers is no longer applicable and no convincing statistical result can be obtained. To address this problem, in this paper a novel imprecise probabilistic approach is proposed to estimate the CFR of an OTL. The imprecise Dirichlet model (IDM) is applied to establish the imprecise probabilistic relation between a single conditional variable and the failure rate of an OTL. Then a credal network is constructed to integrate the IDM estimation results corresponding to different conditional variables and infer the CFR. Instead of providing a single-valued estimation result, the proposed approach predicts the possible interval of the CFR in order to explicitly indicate the uncertainty of the estimation and more objectively represent the available knowledge. The proposed approach is illustrated by estimating the CFRs of two LGJ-300 transmission lines located in the same region; the test results validate its effectiveness.
△ Less
Submitted 2 November, 2016; v1 submitted 23 January, 2016;
originally announced January 2016.
-
Estimation of False Discovery Proportion with Unknown Dependence
Authors:
Jianqing Fan,
Xu Han
Abstract:
Large-scale multiple testing with highly correlated test statistics arises frequently in many scientific research. Incorporating correlation information in estimating false discovery proportion has attracted increasing attention in recent years. When the covariance matrix of test statistics is known, Fan, Han & Gu (2012) provided a consistent estimate of False Discovery Proportion (FDP) under arbi…
▽ More
Large-scale multiple testing with highly correlated test statistics arises frequently in many scientific research. Incorporating correlation information in estimating false discovery proportion has attracted increasing attention in recent years. When the covariance matrix of test statistics is known, Fan, Han & Gu (2012) provided a consistent estimate of False Discovery Proportion (FDP) under arbitrary dependence structure. However, the covariance matrix is often unknown in many applications and such dependence information has to be estimated before estimating FDP (Efron, 2010). The estimation accuracy can greatly affect the convergence result of FDP or even violate its consistency. In the current paper, we provide methodological modification and theoretical investigations for estimation of FDP with unknown covariance. First we develop requirements for estimates of eigenvalues and eigenvectors such that we can obtain a consistent estimate of FDP. Secondly we give conditions on the dependence structures such that the estimate of FDP is consistent. Such dependence structures include sparse covariance matrices, which have been popularly considered in the contemporary random matrix theory. When data are sampled from an approximate factor model, which encompasses most practical situations, we provide a consistent estimate of FDP via exploiting this specific dependence structure. The results are further demonstrated by simulation studies and some real data applications.
△ Less
Submitted 26 March, 2019; v1 submitted 30 May, 2013;
originally announced May 2013.
-
The effect of winning an Oscar Award on survival: Correcting for healthy performer survivor bias with a rank preserving structural accelerated failure time model
Authors:
Xu Han,
Dylan S. Small,
Dean P. Foster,
Vishal Patel
Abstract:
We study the causal effect of winning an Oscar Award on an actor or actress's survival. Does the increase in social rank from a performer winning an Oscar increase the performer's life expectancy? Previous studies of this issue have suffered from healthy performer survivor bias, that is, candidates who are healthier will be able to act in more films and have more chance to win Oscar Awards. To cor…
▽ More
We study the causal effect of winning an Oscar Award on an actor or actress's survival. Does the increase in social rank from a performer winning an Oscar increase the performer's life expectancy? Previous studies of this issue have suffered from healthy performer survivor bias, that is, candidates who are healthier will be able to act in more films and have more chance to win Oscar Awards. To correct this bias, we adapt Robins' rank preserving structural accelerated failure time model and $g$-estimation method. We show in simulation studies that this approach corrects the bias contained in previous studies. We estimate that the effect of winning an Oscar Award on survival is 4.2 years, with a 95% confidence interval of $[-0.4,8.4]$ years. There is not strong evidence that winning an Oscar increases life expectancy.
△ Less
Submitted 3 August, 2011;
originally announced August 2011.
-
Control of the False Discovery Rate Under Arbitrary Covariance Dependence
Authors:
Xu Han,
Weijie Gu,
Jianqing Fan
Abstract:
Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any genes are associated with some traits and those tests are correlated. When test statistics are correlated, false discovery control becomes very challenging und…
▽ More
Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any genes are associated with some traits and those tests are correlated. When test statistics are correlated, false discovery control becomes very challenging under arbitrary dependence. In the current paper, we propose a new methodology based on principal factor approximation, which successfully substracts the common dependence and weakens significantly the correlation structure, to deal with an arbitrary dependence structure. We derive the theoretical distribution for false discovery proportion (FDP) in large scale multiple testing when a common threshold is used and provide a consistent FDP. This result has important applications in controlling FDR and FDP. Our estimate of FDP compares favorably with Efron (2007)'s approach, as demonstrated by in the simulated examples. Our approach is further illustrated by some real data applications.
△ Less
Submitted 20 December, 2010;
originally announced December 2010.
-
Estimating False Discovery Proportion Under Arbitrary Covariance Dependence
Authors:
Jianqing Fan,
Xu Han,
Weijie Gu
Abstract:
Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any SNPs are associated with some traits and those tests are correlated. When test statistics are correlated, false discovery control becomes very challenging unde…
▽ More
Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any SNPs are associated with some traits and those tests are correlated. When test statistics are correlated, false discovery control becomes very challenging under arbitrary dependence. In the current paper, we propose a novel method based on principal factor approximation, which successfully subtracts the common dependence and weakens significantly the correlation structure, to deal with an arbitrary dependence structure. We derive an approximate expression for false discovery proportion (FDP) in large scale multiple testing when a common threshold is used and provide a consistent estimate of realized FDP. This result has important applications in controlling FDR and FDP. Our estimate of realized FDP compares favorably with Efron (2007)'s approach, as demonstrated in the simulated examples. Our approach is further illustrated by some real data applications. We also propose a dependence-adjusted procedure, which is more powerful than the fixed threshold procedure.
△ Less
Submitted 15 November, 2011; v1 submitted 28 October, 2010;
originally announced October 2010.