Skip to main content

Showing 1–45 of 45 results for author: Han, X

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.01813  [pdf, other

    stat.ML cs.AI cs.LG stat.AP stat.ME

    Diffusion Boosted Trees

    Authors: Xizewen Han, Mingyuan Zhou

    Abstract: Combining the merits of both denoising diffusion probabilistic models and gradient boosting, the diffusion boosting paradigm is introduced for tackling supervised learning problems. We develop Diffusion Boosted Trees (DBT), which can be viewed as both a new denoising diffusion generative model parameterized by decision trees (one single tree for each diffusion timestep), and a new boosting algorit… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  2. arXiv:2406.01252  [pdf, other

    cs.CL cs.AI stat.ML

    Towards Scalable Automated Alignment of LLMs: A Survey

    Authors: Boxi Cao, Keming Lu, Xinyu Lu, Jiawei Chen, Mengjie Ren, Hao Xiang, Peilin Liu, Yaojie Lu, Ben He, Xianpei Han, Le Sun, Hongyu Lin, Bowen Yu

    Abstract: Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approach… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  3. arXiv:2404.01216  [pdf, other

    cs.LG cs.SI stat.ML

    Novel Node Category Detection Under Subpopulation Shift

    Authors: Hsing-Huan Chung, Shravan Chaudhari, Yoav Wald, Xing Han, Joydeep Ghosh

    Abstract: In real-world graph data, distribution shifts can manifest in various ways, such as the emergence of new categories and changes in the relative proportions of existing categories. It is often important to detect nodes of novel categories under such distribution shifts for safety or insight discovery purposes. We introduce a new approach, Recall-Constrained Optimization with Selective Link Predicti… ▽ More

    Submitted 30 June, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted to ECML-PKDD 2024

  4. arXiv:2403.01673  [pdf, other

    stat.ML cs.AI cs.LG

    CATS: Enhancing Multivariate Time Series Forecasting by Constructing Auxiliary Time Series as Exogenous Variables

    Authors: Jiecheng Lu, Xu Han, Yan Sun, Shihao Yang

    Abstract: For Multivariate Time Series Forecasting (MTSF), recent deep learning applications show that univariate models frequently outperform multivariate ones. To address the difficiency in multivariate models, we introduce a method to Construct Auxiliary Time Series (CATS) that functions like a 2D temporal-contextual attention mechanism, which generates Auxiliary Time Series (ATS) from Original Time Seri… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  5. arXiv:2402.14959  [pdf, other

    stat.AP cs.CY stat.ML

    A Causal Framework to Evaluate Racial Bias in Law Enforcement Systems

    Authors: Jessy Xinyi Han, Andrew Miller, S. Craig Watkins, Christopher Winship, Fotini Christia, Devavrat Shah

    Abstract: We are interested in develo** a data-driven method to evaluate race-induced biases in law enforcement systems. While the recent works have addressed this question in the context of police-civilian interactions using police stop data, they have two key limitations. First, bias can only be properly quantified if true criminality is accounted for in addition to race, but it is absent in prior works… ▽ More

    Submitted 20 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  6. arXiv:2310.09488  [pdf, other

    stat.ML cs.LG

    ARM: Refining Multivariate Forecasting with Adaptive Temporal-Contextual Learning

    Authors: Jiecheng Lu, Xu Han, Shihao Yang

    Abstract: Long-term time series forecasting (LTSF) is important for various domains but is confronted by challenges in handling the complex temporal-contextual relationships. As multivariate input models underperforming some recent univariate counterparts, we posit that the issue lies in the inefficiency of existing multivariate LTSF Transformers to model series-wise relationships: the characteristic differ… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

  7. arXiv:2309.16597  [pdf

    cs.LG cs.AI stat.ML

    Transfer Learning for Bayesian Optimization on Heterogeneous Search Spaces

    Authors: Zhou Fan, Xinran Han, Zi Wang

    Abstract: Bayesian optimization (BO) is a popular black-box function optimization method, which makes sequential decisions based on a Bayesian model, typically a Gaussian process (GP), of the function. To ensure the quality of the model, transfer learning approaches have been developed to automatically design GP priors by learning from observations on "training" functions. These training functions are typic… ▽ More

    Submitted 13 February, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Journal ref: Transactions on Machine Learning Research (TMLR), February 2024

  8. arXiv:2304.13940  [pdf, other

    stat.ML cs.LG

    A Majorization-Minimization Gauss-Newton Method for 1-Bit Matrix Completion

    Authors: Xiaoqian Liu, Xu Han, Eric C. Chi, Boaz Nadler

    Abstract: In 1-bit matrix completion, the aim is to estimate an underlying low-rank matrix from a partial set of binary observations. We propose a novel method for 1-bit matrix completion called MMGN. Our method is based on the majorization-minimization (MM) principle, which converts the original optimization problem into a sequence of standard low-rank matrix completion problems. We solve each of these sub… ▽ More

    Submitted 22 April, 2024; v1 submitted 26 April, 2023; originally announced April 2023.

    Comments: 28 pages, 7 figures

  9. arXiv:2212.10538  [pdf, other

    cs.LG stat.ML

    HyperBO+: Pre-training a universal prior for Bayesian optimization with hierarchical Gaussian processes

    Authors: Zhou Fan, Xinran Han, Zi Wang

    Abstract: Bayesian optimization (BO), while proved highly effective for many black-box function optimization tasks, requires practitioners to carefully select priors that well model their functions of interest. Rather than specifying by hand, researchers have investigated transfer learning based methods to automatically learn the priors, e.g. multi-task BO (Swersky et al., 2013), few-shot BO (Wistuba and Gr… ▽ More

    Submitted 28 September, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: Full version of the workshop paper at 2022 NeurIPS Workshop on Gaussian Processes, Spatiotemporal Modeling, and Decision-making Systems

  10. arXiv:2210.10643  [pdf, other

    cs.LG cs.AI stat.ML

    Towards Accurate Subgraph Similarity Computation via Neural Graph Pruning

    Authors: Linfeng Liu, Xu Han, Dawei Zhou, Li-** Liu

    Abstract: Subgraph similarity search, one of the core problems in graph search, concerns whether a target graph approximately contains a query graph. The problem is recently touched by neural methods. However, current neural methods do not consider pruning the target graph, though pruning is critically important in traditional calculations of subgraph similarities. One obstacle to applying pruning in neural… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Journal ref: Transactions on Machine Learning Research (TMLR) October 2022

  11. arXiv:2208.10759  [pdf, other

    cs.LG stat.ML

    Survival Mixture Density Networks

    Authors: Xintian Han, Mark Goldstein, Rajesh Ranganath

    Abstract: Survival analysis, the art of time-to-event modeling, plays an important role in clinical treatment decisions. Recently, continuous time models built from neural ODEs have been proposed for survival analysis. However, the training of neural ODEs is slow due to the high computational complexity of neural ODE solvers. Here, we propose an efficient alternative for flexible continuous time models, cal… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

    Comments: Machine Learning for Healthcare 2022

  12. arXiv:2208.08497  [pdf, ps, other

    stat.ML cs.LG q-fin.MF

    Choquet regularization for reinforcement learning

    Authors: Xia Han, Ruodu Wang, Xun Yu Zhou

    Abstract: We propose \emph{Choquet regularizers} to measure and manage the level of exploration for reinforcement learning (RL), and reformulate the continuous-time entropy-regularized RL problem of Wang et al. (2020, JMLR, 21(198)) in which we replace the differential entropy used for regularization with a Choquet regularizer. We derive the Hamilton--Jacobi--Bellman equation of the problem, and solve it ex… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

  13. arXiv:2206.13092  [pdf, other

    stat.ML cs.LG

    Split Localized Conformal Prediction

    Authors: Xing Han, Ziyang Tang, Joydeep Ghosh, Qiang Liu

    Abstract: Conformal prediction is a simple and powerful tool that can quantify uncertainty without any distributional assumptions. Many existing methods only address the average coverage guarantee, which is not ideal compared to the stronger conditional coverage guarantee. Existing methods of approximating conditional coverage require additional models or time effort, which makes them not easy to scale. In… ▽ More

    Submitted 20 February, 2023; v1 submitted 27 June, 2022; originally announced June 2022.

    Comments: 21 pages, 7 figures, 8 tables

  14. arXiv:2206.07275  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    CARD: Classification and Regression Diffusion Models

    Authors: Xizewen Han, Huangjie Zheng, Mingyuan Zhou

    Abstract: Learning the distribution of a continuous or categorical response variable $\boldsymbol y$ given its covariates $\boldsymbol x$ is a fundamental problem in statistics and machine learning. Deep neural network-based supervised learning algorithms have made great progress in predicting the mean of $\boldsymbol y$ given $\boldsymbol x$, but they are often criticized for their ability to accurately ca… ▽ More

    Submitted 6 December, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022

  15. arXiv:2205.12004  [pdf, other

    quant-ph cs.AI cs.LG stat.ML

    Quantum Kerr Learning

    Authors: Junyu Liu, Changchun Zhong, Matthew Otten, Anirban Chandra, Cristian L. Cortes, Chaoyang Ti, Stephen K Gray, Xu Han

    Abstract: Quantum machine learning is a rapidly evolving field of research that could facilitate important applications for quantum computing and also significantly impact data-driven sciences. In our work, based on various arguments from complexity theory and physics, we demonstrate that a single Kerr mode can provide some "quantum enhancements" when dealing with kernel-based methods. Using kernel properti… ▽ More

    Submitted 30 November, 2022; v1 submitted 20 May, 2022; originally announced May 2022.

    Comments: 20 pages, many figures. v2: significant updates, author added

    Journal ref: Mach. Learn.: Sci. Technol. 4 025003, 2023

  16. arXiv:2111.08175  [pdf, other

    cs.LG stat.ML

    Inverse-Weighted Survival Games

    Authors: Xintian Han, Mark Goldstein, Aahlad Puli, Thomas Wies, Adler J Perotte, Rajesh Ranganath

    Abstract: Deep models trained through maximum likelihood have achieved state-of-the-art results for survival analysis. Despite this training scheme, practitioners evaluate models under other criteria, such as binary classification losses at a chosen set of time horizons, e.g. Brier score (BS) and Bernoulli log likelihood (BLL). Models trained with maximum likelihood may have poor BS or BLL since maximum lik… ▽ More

    Submitted 31 January, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: Neurips 2021

  17. arXiv:2111.02664  [pdf

    cs.DL stat.AP

    Scientists are Working Overtime and at the Weekends: Comparison of Publication Downloading from Copyrighted and Pirated Platforms

    Authors: Yu Geng, Ren-Meng Cao, Xiao-Pu Han, Wen-Can Tian, Guang-Yao Zhang, Xian-Wen Wang

    Abstract: In this study, we track and analyze publication downloads from both copyrighted and pirated platforms to reconstruct scientists' activity patterns from a holistic perspective. Scientists around the world are working overtime, but scientists in different countries have different working patterns. Scientists' preferences for different platforms are influenced by a variety of factors such as working… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

    Comments: 15 pages, 6 figures

  18. arXiv:2106.09632  [pdf, other

    stat.ME stat.AP

    Large-Scale Multiple Testing for Matrix-Valued Data under Double Dependency

    Authors: Xu Han, Sanat Sarkar, Shiyu Zhang

    Abstract: High-dimensional inference based on matrix-valued data has drawn increasing attention in modern statistical research, yet not much progress has been made in large-scale multiple testing specifically designed for analysing such data sets. Motivated by this, we consider in this article an electroencephalography (EEG) experiment that produces matrix-valued data and presents a scope of develo** nove… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    MSC Class: 62F03; 62H15

  19. arXiv:2106.09071  [pdf, ps, other

    stat.ME math.ST stat.ML

    Pre-processing with Orthogonal Decompositions for High-dimensional Explanatory Variables

    Authors: Xu Han, Ethan X Fang, Cheng Yong Tang

    Abstract: Strong correlations between explanatory variables are problematic for high-dimensional regularized regression methods. Due to the violation of the Irrepresentable Condition, the popular LASSO method may suffer from false inclusions of inactive variables. In this paper, we propose pre-processing with orthogonal decompositions (PROD) for the explanatory variables in high-dimensional regressions. The… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    MSC Class: 62J05

  20. arXiv:2106.08881  [pdf, other

    cs.LG stat.ME

    Nonparametric Empirical Bayes Estimation and Testing for Sparse and Heteroscedastic Signals

    Authors: Junhui Cai, Xu Han, Ya'acov Ritov, Linda Zhao

    Abstract: Large-scale modern data often involves estimation and testing for high-dimensional unknown parameters. It is desirable to identify the sparse signals, ``the needles in the haystack'', with accuracy and false discovery control. However, the unprecedented complexity and heterogeneity in modern data structure require new machine learning tools to effectively exploit commonalities and to robustly adju… ▽ More

    Submitted 5 November, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

  21. Skilled Mutual Fund Selection: False Discovery Control under Dependence

    Authors: Lijia Wang, Xu Han, Xin Tong

    Abstract: Selecting skilled mutual funds through the multiple testing framework has received increasing attention from finance researchers and statisticians. The intercept $α$ of Carhart four-factor model is commonly used to measure the true performance of mutual funds, and positive $α$'s are considered as skilled. We observe that the standardized OLS estimates of $α$'s across the funds possess strong depen… ▽ More

    Submitted 25 February, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: Accepted for publication

    MSC Class: 62F03; 62J05

    Journal ref: Journal of Business and Economic Statistics 2022

  22. arXiv:2106.06189  [pdf, other

    stat.ML cs.LG cs.SI

    Order Matters: Probabilistic Modeling of Node Sequence for Graph Generation

    Authors: Xiaohui Chen, Xu Han, Jia**g Hu, Francisco J. R. Ruiz, Li** Liu

    Abstract: A graph generative model defines a distribution over graphs. One type of generative model is constructed by autoregressive neural networks, which sequentially add nodes and edges to generate a graph. However, the likelihood of a graph under the autoregressive model is intractable, as there are numerous sequences leading to the given graph; this makes maximum likelihood estimation challenging. Inst… ▽ More

    Submitted 14 June, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: ICML 2021

  23. arXiv:2106.02073  [pdf, other

    cs.LG cs.AI math.DG math.OC stat.ML

    Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path

    Authors: X. Y. Han, Vardan Papyan, David L. Donoho

    Abstract: The recently discovered Neural Collapse (NC) phenomenon occurs pervasively in today's deep net training paradigm of driving cross-entropy (CE) loss towards zero. During NC, last-layer features collapse to their class-means, both classifiers and class-means collapse to the same Simplex Equiangular Tight Frame, and classifier behavior collapses to the nearest-class-mean decision rule. Recent works d… ▽ More

    Submitted 9 May, 2022; v1 submitted 3 June, 2021; originally announced June 2021.

    Comments: ICLR 2022 Outstanding Paper Prize & Oral. Appendix contains [A] empirical experiments, [B-D] proofs of theoretical results, and [E] survey of related works examining Neural Collapse

  24. arXiv:2101.05346  [pdf, other

    cs.LG stat.ML

    X-CAL: Explicit Calibration for Survival Analysis

    Authors: Mark Goldstein, Xintian Han, Aahlad Puli, Adler J. Perotte, Rajesh Ranganath

    Abstract: Survival analysis models the distribution of time until an event of interest, such as discharge from the hospital or admission to the ICU. When a model's predicted number of events within any time interval is similar to the observed number, it is called well-calibrated. A survival model's calibration can be measured using, for instance, distributional calibration (D-CALIBRATION) [Haider et al., 20… ▽ More

    Submitted 13 January, 2021; originally announced January 2021.

  25. arXiv:2010.00729  [pdf, other

    stat.ME

    Individual-centered partial information in social networks

    Authors: Xiao Han, Y. X. Rachel Wang, Qing Yang, Xin Tong

    Abstract: In statistical network analysis, we often assume either the full network is available or multiple subgraphs can be sampled to estimate various global properties of the network. However, in a real social network, people frequently make decisions based on their local view of the network alone. Here, we consider a partial information framework that characterizes the local network centered at a given… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 October, 2020; originally announced October 2020.

  26. arXiv:2008.08186  [pdf, other

    cs.LG cs.CV stat.ML

    Prevalence of Neural Collapse during the terminal phase of deep learning training

    Authors: Vardan Papyan, X. Y. Han, David L. Donoho

    Abstract: Modern practice for training classification deepnets involves a Terminal Phase of Training (TPT), which begins at the epoch where training error first vanishes; During TPT, the training error stays effectively zero while training loss is pushed towards zero. Direct measurements of TPT, for three prototypical deepnet architectures and across seven canonical classification datasets, expose a pervasi… ▽ More

    Submitted 21 August, 2020; v1 submitted 18 August, 2020; originally announced August 2020.

  27. arXiv:2007.07224  [pdf, other

    cs.IR cs.LG stat.ML

    AutoRec: An Automated Recommender System

    Authors: Ting-Hsiang Wang, Qingquan Song, Xiaotian Han, Zirui Liu, Haifeng **, Xia Hu

    Abstract: Realistic recommender systems are often required to adapt to ever-changing data and tasks or to explore different models systematically. To address the need, we present AutoRec, an open-source automated machine learning (AutoML) platform extended from the TensorFlow ecosystem and, to our knowledge, the first framework to leverage AutoML for model search and hyperparameter tuning in deep recommenda… ▽ More

    Submitted 26 June, 2020; originally announced July 2020.

  28. arXiv:2001.07417  [pdf, other

    cs.LG cs.AI stat.ML

    Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach

    Authors: Carlos Fernández-Loría, Foster Provost, Xintian Han

    Abstract: We examine counterfactual explanations for explaining the decisions made by model-based AI systems. The counterfactual approach we consider defines an explanation as a set of the system's data inputs that causally drives the decision (i.e., changing the inputs in the set changes the decision) and is irreducible (i.e., changing any subset of the inputs does not change the decision). We (1) demonstr… ▽ More

    Submitted 13 October, 2021; v1 submitted 21 January, 2020; originally announced January 2020.

  29. arXiv:1912.09147  [pdf, other

    cs.LG stat.ML

    Semi-Supervised Deep Learning Using Improved Unsupervised Discriminant Projection

    Authors: Xiao Han, Zihao Wang, Enmei Tu, Gunnam Suryanarayana, Jie Yang

    Abstract: Deep learning demands a huge amount of well-labeled data to train the network parameters. How to use the least amount of labeled data to obtain the desired classification accuracy is of great practical significance, because for many real-world applications (such as medical diagnosis), it is difficult to obtain so many labeled samples. In this paper, modify the unsupervised discriminant projection… ▽ More

    Submitted 19 December, 2019; originally announced December 2019.

    Comments: 1 figures

  30. arXiv:1912.01157  [pdf, ps, other

    math.ST stat.ME

    Nonparametric Screening under Conditional Strictly Convex Loss for Ultrahigh Dimensional Sparse Data

    Authors: Xu Han

    Abstract: Sure screening technique has been considered as a powerful tool to handle the ultrahigh dimensional variable selection problems, where the dimensionality p and the sample size n can satisfy the NP dimensionality log p=O(n^a) for some a>0 (Fan & Lv 2008). The current paper aims to simultaneously tackle the "universality" and "effectiveness" of sure screening procedures. For the "universality", we d… ▽ More

    Submitted 2 December, 2019; originally announced December 2019.

    Comments: Supplementary materials including the technical proofs are available online at Annals of Statistics

    Journal ref: Annals of Statistics, 2019, Vol 47, No 4, 1995-2022

  31. arXiv:1911.09171  [pdf, other

    stat.ME stat.AP

    Re-Evaluating Strengthened-IV Designs: Asymptotic Efficiency, Bias Formula, and the Validity and Power of Sensitivity Analyses

    Authors: Siyu Heng, Bo Zhang, Xu Han, Scott A. Lorch, Dylan S. Small

    Abstract: Instrumental variables (IVs) are extensively used to estimate treatment effects when the treatment and outcome are confounded by unmeasured confounders; however, weak IVs are often encountered in empirical studies and may cause problems. Many studies have considered building a stronger IV from the original, possibly weak, IV in the design stage of a matched study at the cost of not using some of t… ▽ More

    Submitted 15 October, 2021; v1 submitted 20 November, 2019; originally announced November 2019.

    Comments: 86 pages, 4 figures, 6 tables

  32. arXiv:1910.01734  [pdf, other

    stat.ME math.ST stat.ML

    SIMPLE: Statistical Inference on Membership Profiles in Large Networks

    Authors: Jianqing Fan, Yingying Fan, Xiao Han, **chi Lv

    Abstract: Network data is prevalent in many contemporary big data applications in which a common interest is to unveil important latent links between different pairs of nodes. Yet a simple fundamental question of how to precisely quantify the statistical uncertainty associated with the identification of latent links still remains largely unexplored. In this paper, we propose the method of statistical infere… ▽ More

    Submitted 29 August, 2021; v1 submitted 3 October, 2019; originally announced October 2019.

    Comments: 59 pages, 4 figures; Journal of the Royal Statistical Society Series B, to appear

  33. arXiv:1908.01843  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification

    Authors: Jie Zhou, Xu Han, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, Maosong Sun

    Abstract: Fact verification (FV) is a challenging task which requires to retrieve relevant evidence from plain text and use the evidence to verify given claims. Many claims require to simultaneously integrate and reason over several pieces of evidence for verification. However, previous work employs simple models to extract information from evidence without letting evidence communicate with each other, e.g.… ▽ More

    Submitted 22 July, 2019; originally announced August 2019.

    Comments: Accepted by ACL 2019

  34. arXiv:1907.08937  [pdf, other

    cs.AI cs.CL cs.LG stat.ML

    Quantifying Similarity between Relations with Fact Distribution

    Authors: Weize Chen, Hao Zhu, Xu Han, Zhiyuan Liu, Maosong Sun

    Abstract: We introduce a conceptually simple and effective method to quantify the similarity between relations in knowledge bases. Specifically, our approach is based on the divergence between the conditional probability distributions over entity pairs. In this paper, these distributions are parameterized by a very simple neural network. Although computing the exact similarity is in-tractable, we provide a… ▽ More

    Submitted 21 July, 2019; originally announced July 2019.

    Comments: ACL 2019

  35. arXiv:1905.05163  [pdf, other

    eess.SP cs.CR cs.LG stat.ML

    Adversarial Examples for Electrocardiograms

    Authors: Xintian Han, Yuxuan Hu, Luca Foschini, Larry Chinitz, Lior Jankelson, Rajesh Ranganath

    Abstract: In recent years, the electrocardiogram (ECG) has seen a large diffusion in both medical and commercial applications, fueled by the rise of single-lead versions. Single-lead ECG can be embedded in medical devices and wearable products such as the injectable Medtronic Linq monitor, the iRhythm Ziopatch wearable monitor, and the Apple Watch Series 4. Recently, deep neural networks have been used to a… ▽ More

    Submitted 4 June, 2019; v1 submitted 13 May, 2019; originally announced May 2019.

  36. arXiv:1904.04478  [pdf, other

    stat.ML cs.LG

    Kernelized Complete Conditional Stein Discrepancy

    Authors: Raghav Singhal, Xintian Han, Saad Lahlou, Rajesh Ranganath

    Abstract: Much of machine learning relies on comparing distributions with discrepancy measures. Stein's method creates discrepancy measures between two distributions that require only the unnormalized density of one and samples from the other. Stein discrepancies can be combined with kernels to define kernelized Stein discrepancies (KSDs). While kernels make Stein discrepancies tractable, they pose several… ▽ More

    Submitted 17 July, 2020; v1 submitted 9 April, 2019; originally announced April 2019.

  37. arXiv:1810.10147  [pdf, ps, other

    cs.LG cs.AI cs.CL stat.ML

    FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation

    Authors: Xu Han, Hao Zhu, Pengfei Yu, Ziyun Wang, Yuan Yao, Zhiyuan Liu, Maosong Sun

    Abstract: We present a Few-Shot Relation Classification Dataset (FewRel), consisting of 70, 000 sentences on 100 relations derived from Wikipedia and annotated by crowdworkers. The relation of each sentence is first recognized by distant supervision methods, and then filtered by crowdworkers. We adapt the most recent state-of-the-art few-shot learning methods for relation classification and conduct a thorou… ▽ More

    Submitted 26 October, 2018; v1 submitted 23 October, 2018; originally announced October 2018.

    Comments: EMNLP 2018. The first four authors contribute equally. The order is determined by dice rolling. Visit our website http://zhuhao.me/fewrel

  38. arXiv:1803.05127  [pdf, other

    stat.AP cs.IR

    Feature Selection and Model Comparison on Microsoft Learning-to-Rank Data Sets

    Authors: Xinzhi Han, Sen Lei

    Abstract: With the rapid advance of the Internet, search engines (e.g., Google, Bing, Yahoo!) are used by billions of users for each day. The main function of a search engine is to locate the most relevant webpages corresponding to what the user requests. This report focuses on the core problem of information retrieval: how to learn the relevance between a document (very often webpage) and a query given by… ▽ More

    Submitted 13 March, 2018; originally announced March 2018.

    Comments: 24 pages

  39. arXiv:1801.00641  [pdf, ps, other

    physics.soc-ph cs.CY stat.AP

    Triangle-map** Analysis on Spatial Competition and Cooperation of Chinese Cities

    Authors: Pan Liu, Xiao-Pu Han, Linyuan Lü

    Abstract: In this paper, we empirically analyze the spatial distribution of Chinese cities using a method based on triangle transition. This method uses a regular triangle map** from the observed cities and its three neighboring cities to analyze their distribution of map** positions. We find that obvious center-gathering tendency for the relationship between cities and its nearest three cities, indicat… ▽ More

    Submitted 2 January, 2018; originally announced January 2018.

    Comments: The 43rd Annual Conference of the IEEE Industrial Electronics Society

  40. arXiv:1703.02998  [pdf, other

    stat.CO

    A note on quickly sampling a sparse matrix with low rank expectation

    Authors: Karl Rohe, Jun Tao, Xintian Han, Norbert Binkiewicz

    Abstract: Given matrices $X,Y \in R^{n \times K}$ and $S \in R^{K \times K}$ with positive elements, this paper proposes an algorithm fastRG to sample a sparse matrix $A$ with low rank expectation $E(A) = XSY^T$ and independent Poisson elements. This allows for quickly sampling from a broad class of stochastic blockmodel graphs (degree-corrected, mixed membership, overlap**) all of which are specific para… ▽ More

    Submitted 8 March, 2017; originally announced March 2017.

  41. Interval Estimation for Conditional Failure Rates of Transmission Lines with Limited Samples

    Authors: Ming Yang, Jianhui Wang, Haoran Diao, Junjian Qi, Xueshan Han

    Abstract: The estimation of the conditional failure rate (CFR) of an overhead transmission line (OTL) is essential for power system operational reliability assessment. It is hard to predict the CFR precisely, although great efforts have been made to improve the estimation accuracy. One significant difficulty is the lack of available outage samples, due to which the law of large numbers is no longer applicab… ▽ More

    Submitted 2 November, 2016; v1 submitted 23 January, 2016; originally announced January 2016.

    Comments: 11 pages, 6 figures

  42. arXiv:1305.7007  [pdf, other

    stat.ME

    Estimation of False Discovery Proportion with Unknown Dependence

    Authors: Jianqing Fan, Xu Han

    Abstract: Large-scale multiple testing with highly correlated test statistics arises frequently in many scientific research. Incorporating correlation information in estimating false discovery proportion has attracted increasing attention in recent years. When the covariance matrix of test statistics is known, Fan, Han & Gu (2012) provided a consistent estimate of False Discovery Proportion (FDP) under arbi… ▽ More

    Submitted 26 March, 2019; v1 submitted 30 May, 2013; originally announced May 2013.

    Comments: 39 pages, 7 figures

    MSC Class: 62F03

    Journal ref: Published in Journal of Royal Statistical Society-Methodology, Vol 79, 1143-1164, 2017

  43. The effect of winning an Oscar Award on survival: Correcting for healthy performer survivor bias with a rank preserving structural accelerated failure time model

    Authors: Xu Han, Dylan S. Small, Dean P. Foster, Vishal Patel

    Abstract: We study the causal effect of winning an Oscar Award on an actor or actress's survival. Does the increase in social rank from a performer winning an Oscar increase the performer's life expectancy? Previous studies of this issue have suffered from healthy performer survivor bias, that is, candidates who are healthier will be able to act in more films and have more chance to win Oscar Awards. To cor… ▽ More

    Submitted 3 August, 2011; originally announced August 2011.

    Comments: Published in at http://dx.doi.org/10.1214/10-AOAS424 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS424

    Journal ref: Annals of Applied Statistics 2011, Vol. 5, No. 2A, 746-772

  44. arXiv:1012.4397  [pdf, other

    stat.ME

    Control of the False Discovery Rate Under Arbitrary Covariance Dependence

    Authors: Xu Han, Weijie Gu, Jianqing Fan

    Abstract: Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any genes are associated with some traits and those tests are correlated. When test statistics are correlated, false discovery control becomes very challenging und… ▽ More

    Submitted 20 December, 2010; originally announced December 2010.

    Comments: 44 pages, 7 figures

    Report number: 1010.6056v2 MSC Class: 62H15; 62P10

  45. arXiv:1010.6056  [pdf, other

    stat.ME math.ST

    Estimating False Discovery Proportion Under Arbitrary Covariance Dependence

    Authors: Jianqing Fan, Xu Han, Weijie Gu

    Abstract: Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any SNPs are associated with some traits and those tests are correlated. When test statistics are correlated, false discovery control becomes very challenging unde… ▽ More

    Submitted 15 November, 2011; v1 submitted 28 October, 2010; originally announced October 2010.

    Comments: 51 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1012.4397

    MSC Class: 62H15; 62P10