Skip to main content

Showing 1–31 of 31 results for author: Xu, F

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.11501  [pdf, other

    cs.LG cs.AI stat.ME

    Teleporter Theory: A General and Simple Approach for Modeling Cross-World Counterfactual Causality

    Authors: Jiangmeng Li, Bin Qin, Qirui Ji, Yi Li, Wenwen Qiang, Jianwen Cao, Fanjiang Xu

    Abstract: Leveraging the development of structural causal model (SCM), researchers can establish graphical models for exploring the causal mechanisms behind machine learning techniques. As the complexity of machine learning applications rises, single-world interventionism causal analysis encounters theoretical adaptation limitations. Accordingly, cross-world counterfactual approach extends our understanding… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2311.12392  [pdf, other

    stat.ME math.ST stat.ML

    Individualized Dynamic Latent Factor Model for Multi-resolutional Data with Application to Mobile Health

    Authors: Jiuchen Zhang, Fei Xue, Qi Xu, Jung-Ah Lee, Annie Qu

    Abstract: Mobile health has emerged as a major success for tracking individual health status, due to the popularity and power of smartphones and wearable devices. This has also brought great challenges in handling heterogeneous, multi-resolution data which arise ubiquitously in mobile health due to irregular multivariate measurements collected from individuals. In this paper, we propose an individualized dy… ▽ More

    Submitted 29 May, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: 43 pages, 3 figures

    MSC Class: 82-10 ACM Class: G.3

  3. arXiv:2306.15779  [pdf, other

    stat.ME math.ST stat.AP

    High-dimensional statistical inference for linkage disequilibrium score regression and its cross-ancestry extensions

    Authors: Fei Xue, Bingxin Zhao

    Abstract: Linkage disequilibrium score regression (LDSC) has emerged as an essential tool for genetic and genomic analyses of complex traits, utilizing high-dimensional data derived from genome-wide association studies (GWAS). LDSC computes the linkage disequilibrium (LD) scores using an external reference panel, and integrates the LD scores with only summary data from the original GWAS. In this paper, we i… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: 13 figures

  4. arXiv:2304.01345  [pdf, other

    q-bio.NC stat.ME

    Establishing group-level brain structural connectivity incorporating anatomical knowledge under latent space modeling

    Authors: Selena Wang, Yiting Wang, Frederick H. Xu, Li Shen, Yize Zhao

    Abstract: Brain structural connectivity, capturing the white matter fiber tracts among brain regions inferred by diffusion MRI (dMRI), provides a unique characterization of brain anatomical organization. One fundamental question to address with structural connectivity is how to properly summarize and perform statistical inference for a group-level connectivity architecture, for instance, under different sex… ▽ More

    Submitted 21 February, 2023; originally announced April 2023.

  5. arXiv:2211.13889  [pdf, other

    stat.ME stat.AP

    An Empirical Bayes Regression for Multi-tissue eQTL Data Analysis

    Authors: Fei Xue, Hongzhe Li

    Abstract: The Genotype-Tissue Expression (GTEx) project collects samples from multiple human tissues to study the relationship between genetic variation or single nucleotide polymorphisms (SNPs) and gene expression in each tissue. However, most existing eQTL analyses only focus on single tissue information. In this paper, we develop a multi-tissue eQTL analysis that improves the single tissue cis-SNP gene e… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

  6. arXiv:2206.13984  [pdf, ps, other

    cs.IT cs.LG stat.ML

    Fundamental Limits of Communication Efficiency for Model Aggregation in Distributed Learning: A Rate-Distortion Approach

    Authors: Naifu Zhang, Meixia Tao, Jia Wang, Fan Xu

    Abstract: One of the main focuses in distributed learning is communication efficiency, since model aggregation at each round of training can consist of millions to billions of parameters. Several model compression methods, such as gradient quantization and sparsification, have been proposed to improve the communication efficiency of model aggregation. However, the information-theoretic minimum communication… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    Comments: 30 pages, 8 figures, the conference version has been accepted by ISIT2021

  7. arXiv:2106.03344  [pdf, other

    stat.ME math.ST stat.ML

    Statistical Inference for High-Dimensional Linear Regression with Blockwise Missing Data

    Authors: Fei Xue, Rong Ma, Hongzhe Li

    Abstract: Blockwise missing data occurs frequently when we integrate multisource or multimodality data where different sources or modalities contain complementary information. In this paper, we consider a high-dimensional linear regression model with blockwise missing covariates and a partially observed response variable. Under this framework, we propose a computationally efficient estimator for the regress… ▽ More

    Submitted 28 June, 2023; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: V2: 40 pages, 2 figures. Accepted at Statistica Sinica

  8. arXiv:2008.03901  [pdf, other

    cs.LG cs.CV stat.ML

    RARTS: An Efficient First-Order Relaxed Architecture Search Method

    Authors: Fanghui Xue, Yingyong Qi, Jack Xin

    Abstract: Differentiable architecture search (DARTS) is an effective method for data-driven neural network design based on solving a bilevel optimization problem. Despite its success in many architecture search tasks, there are still some concerns about the accuracy of first-order DARTS and the efficiency of the second-order DARTS. In this paper, we formulate a single level alternative and a relaxed archite… ▽ More

    Submitted 24 June, 2022; v1 submitted 10 August, 2020; originally announced August 2020.

  9. arXiv:2007.04395  [pdf, other

    cs.LG cs.AI stat.ML

    Multilevel Graph Matching Networks for Deep Graph Similarity Learning

    Authors: Xiang Ling, Lingfei Wu, Saizhuo Wang, Tengfei Ma, Fangli Xu, Alex X. Liu, Chunming Wu, Shouling Ji

    Abstract: While the celebrated graph neural networks yield effective representations for individual nodes of a graph, there has been relatively less success in extending to the task of graph similarity learning. Recent work on graph similarity learning has considered either global-level graph-graph interactions or low-level node-node interactions, however ignoring the rich cross-level interactions (e.g., be… ▽ More

    Submitted 7 August, 2021; v1 submitted 8 July, 2020; originally announced July 2020.

    Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS)

  10. arXiv:2007.03383  [pdf, other

    cs.LG cs.IR stat.ML

    RGCF: Refined Graph Convolution Collaborative Filtering with concise and expressive embedding

    Authors: Kang Liu, Feng Xue, Richang Hong

    Abstract: Graph Convolution Network (GCN) has attracted significant attention and become the most popular method for learning graph representations. In recent years, many efforts have been focused on integrating GCN into the recommender tasks and have made remarkable progress. At its core is to explicitly capture high-order connectivities between the nodes in user-item bipartite graph. However, we theoretic… ▽ More

    Submitted 11 July, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

  11. arXiv:2007.02126  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Deep Graph Random Process for Relational-Thinking-Based Speech Recognition

    Authors: Hengguan Huang, Fuzhao Xue, Hao Wang, Ye Wang

    Abstract: Lying at the core of human intelligence, relational thinking is characterized by initially relying on innumerable unconscious percepts pertaining to relations between new sensory signals and prior knowledge, consequently becoming a recognizable concept or object through coupling and transformation of these percepts. Such mental processes are difficult to model in real-world problems such as in con… ▽ More

    Submitted 8 July, 2020; v1 submitted 4 July, 2020; originally announced July 2020.

    Comments: Accepted at ICML 2020

  12. BusTr: Predicting Bus Travel Times from Real-Time Traffic

    Authors: Richard Barnes, Senaka Buthpitiya, James Cook, Alex Fabrikant, Andrew Tomkins, Fangzhou Xu

    Abstract: We present BusTr, a machine-learned model for translating road traffic forecasts into predictions of bus delays, used by Google Maps to serve the majority of the world's public transit systems where no official real-time bus tracking is provided. We demonstrate that our neural sequence model improves over DeepTTE, the state-of-the-art baseline, both in performance (-30% MAPE) and training stabilit… ▽ More

    Submitted 2 July, 2020; originally announced July 2020.

    Comments: 14 pages, 2 figures, 5 tables. Citation: "Richard Barnes, Senaka Buthpitiya, James Cook, Alex Fabrikant, Andrew Tomkins, Fangzhou Xu (2020). BusTr: Predicting Bus Travel Times from Real-Time Traffic. 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. doi: 10.1145/3394486.3403376"

  13. arXiv:2006.10027  [pdf, other

    eess.IV cs.LG stat.ML

    Deep Learning Meets SAR

    Authors: Xiao Xiang Zhu, Sina Montazeri, Mohsin Ali, Yuansheng Hua, Yuanyuan Wang, Lichao Mou, Yilei Shi, Feng Xu, Richard Bamler

    Abstract: Deep learning in remote sensing has become an international hype, but it is mostly limited to the evaluation of optical data. Although deep learning has been introduced in Synthetic Aperture Radar (SAR) data processing, despite successful first attempts, its huge potential remains locked. In this paper, we provide an introduction to the most relevant deep learning models and concepts, point out po… ▽ More

    Submitted 5 January, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: article accepted by IEEE Geoscience and Remote Sensing Magazine. Copyright may be transferred without notice, after which this version may no longer be accessible

  14. arXiv:2004.13781  [pdf, other

    cs.CL cs.LG stat.ML

    Graph-to-Tree Neural Networks for Learning Structured Input-Output Translation with Applications to Semantic Parsing and Math Word Problem

    Authors: Shucheng Li, Lingfei Wu, Shiwei Feng, Fangli Xu, Fengyuan Xu, Sheng Zhong

    Abstract: The celebrated Seq2Seq technique and its numerous variants achieve excellent performance on many tasks such as neural machine translation, semantic parsing, and math word problem solving. However, these models either only consider input objects as sequences while ignoring the important structural information for encoding, or they simply treat output objects as sequence outputs instead of structura… ▽ More

    Submitted 6 October, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

    Comments: Long Paper in EMNLP 2020. 12 pages including references

  15. arXiv:2001.04629  [pdf, other

    stat.ME stat.ML

    Multicategory Angle-based Learning for Estimating Optimal Dynamic Treatment Regimes with Censored Data

    Authors: Fei Xue, Yanqing Zhang, Wenzhuo Zhou, Haoda Fu, Annie Qu

    Abstract: An optimal dynamic treatment regime (DTR) consists of a sequence of decision rules in maximizing long-term benefits, which is applicable for chronic diseases such as HIV infection or cancer. In this paper, we develop a novel angle-based approach to search the optimal DTR under a multicategory treatment framework for survival data. The proposed method targets maximization the conditional survival f… ▽ More

    Submitted 14 January, 2020; originally announced January 2020.

    Comments: 35 pages, 11 figures, 6 tables

  16. arXiv:1911.12675  [pdf, other

    cs.CV cs.LG cs.NE stat.ML

    Continuous Dropout

    Authors: Xu Shen, Xinmei Tian, Tongliang Liu, Fang Xu, Dacheng Tao

    Abstract: Dropout has been proven to be an effective algorithm for training robust deep networks because of its ability to prevent overfitting by avoiding the co-adaptation of feature detectors. Current explanations of dropout include bagging, naive Bayes, regularization, and sex in evolution. According to the activation patterns of neurons in the human brain, when faced with different situations, the firin… ▽ More

    Submitted 28 November, 2019; originally announced November 2019.

    Comments: Accepted by TNNLS

  17. arXiv:1910.07115  [pdf, other

    cs.LG cs.CL cs.SE stat.ML

    HiGitClass: Keyword-Driven Hierarchical Classification of GitHub Repositories

    Authors: Yu Zhang, Frank F. Xu, Sha Li, Yu Meng, Xuan Wang, Qi Li, Jiawei Han

    Abstract: GitHub has become an important platform for code sharing and scientific exchange. With the massive number of repositories available, there is a pressing need for topic-based search. Even though the topic label functionality has been introduced, the majority of GitHub repositories do not have any labels, impeding the utility of search and topic-based analysis. This work targets the automatic reposi… ▽ More

    Submitted 13 November, 2021; v1 submitted 15 October, 2019; originally announced October 2019.

    Comments: 10 pages; Accepted to ICDM 2019; Some typos fixed

  18. arXiv:1910.06078  [pdf, other

    cs.CY stat.ML

    MUTLA: A Large-Scale Dataset for Multimodal Teaching and Learning Analytics

    Authors: Fangli Xu, Lingfei Wu, KP Thai, Carol Hsu, Wei Wang, Richard Tong

    Abstract: Automatic analysis of teacher and student interactions could be very important to improve the quality of teaching and student engagement. However, despite some recent progress in utilizing multimodal data for teaching and learning analytics, a thorough analysis of a rich multimodal dataset coming for a complex real learning environment has yet to be done. To bridge this gap, we present a large-sca… ▽ More

    Submitted 6 December, 2022; v1 submitted 4 October, 2019; originally announced October 2019.

    Comments: 3 pages, 1 figure, 2 tables workshop paper

  19. arXiv:1908.07832  [pdf, other

    cs.CL cs.LG stat.ML

    Parsimonious Morpheme Segmentation with an Application to Enriching Word Embeddings

    Authors: Ahmed El-Kishky, Frank Xu, Aston Zhang, Jiawei Han

    Abstract: Traditionally, many text-mining tasks treat individual word-tokens as the finest meaningful semantic granularity. However, in many languages and specialized corpora, words are composed by concatenating semantically meaningful subword structures. Word-level analysis cannot leverage the semantic information present in such subword structures. With regard to word embedding techniques, this leads to n… ▽ More

    Submitted 13 November, 2019; v1 submitted 17 August, 2019; originally announced August 2019.

  20. arXiv:1901.03797  [pdf, other

    stat.ME

    Integrating multi-source block-wise missing data in model selection

    Authors: Fei Xue, Annie Qu

    Abstract: For multi-source data, blocks of variable information from certain sources are likely missing. Existing methods for handling missing data do not take structures of block-wise missing data into consideration. In this paper, we propose a Multiple Block-wise Imputation (MBI) approach, which incorporates imputations based on both complete and incomplete observations. Specifically, for a given missing… ▽ More

    Submitted 5 April, 2020; v1 submitted 11 January, 2019; originally announced January 2019.

    Comments: 35 pages, 2 figures, accepted for publication in Journal of the American Statistical Association

  21. arXiv:1901.03749  [pdf

    cs.CV cs.LG stat.ML

    Translating SAR to Optical Images for Assisted Interpretation

    Authors: Shilei Fu, Feng Xu, Ya-Qiu **

    Abstract: Despite the advantages of all-weather and all-day high-resolution imaging, SAR remote sensing images are much less viewed and used by general people because human vision is not adapted to microwave scattering phenomenon. However, expert interpreters can be trained by compare side-by-side SAR and optical images to learn the translation rules from SAR to optical. This paper attempts to develop machi… ▽ More

    Submitted 8 January, 2019; originally announced January 2019.

    Comments: 4 pages, 5 figures, 2 tables, conference

  22. arXiv:1811.01713  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Word Mover's Embedding: From Word2Vec to Document Embedding

    Authors: Lingfei Wu, Ian E. H. Yen, Kun Xu, Fangli Xu, Avinash Balakrishnan, Pin-Yu Chen, Pradeep Ravikumar, Michael J. Witbrock

    Abstract: While the celebrated Word2Vec technique yields semantically rich representations for individual words, there has been relatively less success in extending to generate unsupervised sentences or documents embeddings. Recent work has demonstrated that a distance measure between documents called \emph{Word Mover's Distance} (WMD) that aligns semantically similar words, yields unprecedented KNN classif… ▽ More

    Submitted 30 October, 2018; originally announced November 2018.

    Comments: EMNLP'18 Camera-Ready Version

  23. arXiv:1809.05259  [pdf, other

    cs.LG stat.ML

    Random War** Series: A Random Features Method for Time-Series Embedding

    Authors: Lingfei Wu, Ian En-Hsu Yen, **feng Yi, Fangli Xu, Qi Lei, Michael Witbrock

    Abstract: Time series data analytics has been a problem of substantial interests for decades, and Dynamic Time War** (DTW) has been the most widely adopted technique to measure dissimilarity between time series. A number of global-alignment kernels have since been proposed in the spirit of DTW to extend its use to kernel-based estimation method such as support vector machine. However, those kernels suffer… ▽ More

    Submitted 14 September, 2018; originally announced September 2018.

    Comments: AIStats18, Oral Paper, Add code link for generating RWS

  24. arXiv:1805.11048  [pdf, other

    cs.LG stat.ML

    Scalable Spectral Clustering Using Random Binning Features

    Authors: Lingfei Wu, Pin-Yu Chen, Ian En-Hsu Yen, Fangli Xu, Yinglong Xia, Charu Aggarwal

    Abstract: Spectral clustering is one of the most effective clustering approaches that capture hidden cluster structures in the data. However, it does not scale well to large-scale problems due to its quadratic complexity in constructing similarity graphs and computing subsequent eigendecomposition. Although a number of methods have been proposed to accelerate spectral clustering, most of them compromise con… ▽ More

    Submitted 25 November, 2019; v1 submitted 25 May, 2018; originally announced May 2018.

    Comments: KDD'18, Oral Paper, Data and Code link available in the paper

  25. arXiv:1805.07777  [pdf, other

    cs.CV stat.AP stat.ML

    DLBI: Deep learning guided Bayesian inference for structure reconstruction of super-resolution fluorescence microscopy

    Authors: Yu Li, Fan Xu, Fa Zhang, **yong Xu, Mingshu Zhang, Ming Fan, Lihua Li, Xin Gao, Renmin Han

    Abstract: Super-resolution fluorescence microscopy, with a resolution beyond the diffraction limit of light, has become an indispensable tool to directly visualize biological structures in living cells at a nanometer-scale resolution. Despite advances in high-density super-resolution fluorescent techniques, existing methods still have bottlenecks, including extremely long execution time, artificial thinning… ▽ More

    Submitted 1 September, 2018; v1 submitted 20 May, 2018; originally announced May 2018.

    Comments: Accepted by ISMB 2018

    Journal ref: Bioinformatics, Volume 34, Issue 13, 1 July 2018

  26. arXiv:1802.04956  [pdf, ps, other

    stat.ML cs.LG

    D2KE: From Distance to Kernel and Embedding

    Authors: Lingfei Wu, Ian En-Hsu Yen, Fangli Xu, Pradeep Ravikumar, Michael Witbrock

    Abstract: For many machine learning problem settings, particularly with structured inputs such as sequences or sets of objects, a distance measure between inputs can be specified more naturally than a feature representation. However, most standard machine models are designed for inputs with a vector feature representation. In this work, we consider the estimation of a function… ▽ More

    Submitted 25 May, 2018; v1 submitted 13 February, 2018; originally announced February 2018.

    Comments: 15 pages, 4 tables

  27. arXiv:1802.03529  [pdf, other

    stat.AP physics.optics quant-ph

    Revealing hidden scenes by photon-efficient occlusion-based opportunistic active imaging

    Authors: Feihu Xu, Gal Shulkind, Christos Thrampoulidis, Jeffrey H. Shapiro, Antonio Torralba, Franco N. C. Wong, Gregory W. Wornell

    Abstract: The ability to see around corners, i.e., recover details of a hidden scene from its reflections in the surrounding environment, is of considerable interest in a wide range of applications. However, the diffuse nature of light reflected from typical surfaces leads to mixing of spatial information in the collected light, precluding useful scene reconstruction. Here, we employ a computational imaging… ▽ More

    Submitted 10 February, 2018; originally announced February 2018.

    Comments: Related theory in arXiv:1711.06297

    Journal ref: Optics Express 26, 9945 (2018)

  28. arXiv:1709.05062  [pdf, other

    stat.ME

    Individualized Multi-directional Variable Selection

    Authors: Xiwei Tang, Fei Xue, Annie Qu

    Abstract: In this paper we propose a heterogeneous modeling framework which achieves individual-wise feature selection and individualized covariates' effects subgrou** simultaneously. In contrast to conventional model selection approaches, the new approach constructs a separation penalty with multi-directional shrinkages, which facilitates individualized modeling to distinguish strong signals from noisy o… ▽ More

    Submitted 10 June, 2019; v1 submitted 15 September, 2017; originally announced September 2017.

  29. arXiv:1709.04840  [pdf, other

    stat.ME math.ST

    Semi-standard partial covariance variable selection when irrepresentable conditions fail

    Authors: Fei Xue, Annie Qu

    Abstract: Traditional variable selection methods could fail to be sign consistent when irrepresentable conditions are violated. This is especially critical in high-dimensional settings when the number of predictors exceeds the sample size. In this paper, we propose a new semi-standard partial covariance (SPAC) approach which is capable of reducing correlation effects from other covariates while fully captur… ▽ More

    Submitted 24 April, 2022; v1 submitted 14 September, 2017; originally announced September 2017.

    Comments: 40 pages, 1 figure. Revised according to referee's suggestions. To appear in Statistica Sinica

  30. arXiv:1609.00626  [pdf, other

    cs.CL stat.AP

    SynsetRank: Degree-adjusted Random Walk for Relation Identification

    Authors: Shinichi Nakajima, Sebastian Krause, Dirk Weissenborn, Sven Schmeier, Nico Goernitz, Feiyu Xu

    Abstract: In relation extraction, a key process is to obtain good detectors that find relevant sentences describing the target relation. To minimize the necessity of labeled data for refining detectors, previous work successfully made use of BabelNet, a semantic graph structure expressing relationships between synsets, as side information or prior knowledge. The goal of this paper is to enhance the use of g… ▽ More

    Submitted 15 September, 2016; v1 submitted 2 September, 2016; originally announced September 2016.

  31. Assessment of mortgage default risk via Bayesian state space models

    Authors: Tevfik Aktekin, Refik Soyer, Feng Xu

    Abstract: Managing risk at the aggregate level is crucial for banks and financial institutions as required by the Basel III framework. In this paper, we introduce discrete time Bayesian state space models with Poisson measurements to model aggregate mortgage default rate. We discuss parameter updating, filtering, smoothing, forecasting and estimation using Markov chain Monte Carlo methods. In addition, we i… ▽ More

    Submitted 28 November, 2013; originally announced November 2013.

    Comments: Published in at http://dx.doi.org/10.1214/13-AOAS632 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS632

    Journal ref: Annals of Applied Statistics 2013, Vol. 7, No. 3, 1450-1473