-
Exploring causal effects of hormone- and radio-treatments in an observational study of breast cancer using copula-based semi-competing risks models
Authors:
Tonghui Yu,
Mengjiao Peng,
Yifan Cui,
Elynn Chen,
Chixiang Chen
Abstract:
Breast cancer patients may experience relapse or death after surgery during the follow-up period, leading to dependent censoring of relapse. This phenomenon, known as semi-competing risk, imposes challenges in analyzing treatment effects on breast cancer and necessitates advanced statistical tools for unbiased analysis. Despite progress in estimation and inference within semi-competing risks regre…
▽ More
Breast cancer patients may experience relapse or death after surgery during the follow-up period, leading to dependent censoring of relapse. This phenomenon, known as semi-competing risk, imposes challenges in analyzing treatment effects on breast cancer and necessitates advanced statistical tools for unbiased analysis. Despite progress in estimation and inference within semi-competing risks regression, its application to causal inference is still in its early stages. This article aims to propose a frequentist and semi-parametric framework based on copula models that can facilitate valid causal inference, net quantity estimation and interpretation, and sensitivity analysis for unmeasured factors under right-censored semi-competing risks data. We also propose novel procedures to enhance parameter estimation and its applicability in real practice. After that, we apply the proposed framework to a breast cancer study and detect the time-varying causal effects of hormone- and radio-treatments on patients' relapse-free survival and overall survival. Moreover, extensive numerical evaluations demonstrate the method's feasibility, highlighting minimal estimation bias and reliable statistical inference.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Advancing Information Integration through Empirical Likelihood: Selective Reviews and a New Idea
Authors:
Chixiang Chen,
Jia Liang,
Elynn Chen,
Ming Wang
Abstract:
Information integration plays a pivotal role in biomedical studies by facilitating the combination and analysis of independent datasets from multiple studies, thereby uncovering valuable insights that might otherwise remain obscured due to the limited sample size in individual studies. However, sharing raw data from independent studies presents significant challenges, primarily due to the need to…
▽ More
Information integration plays a pivotal role in biomedical studies by facilitating the combination and analysis of independent datasets from multiple studies, thereby uncovering valuable insights that might otherwise remain obscured due to the limited sample size in individual studies. However, sharing raw data from independent studies presents significant challenges, primarily due to the need to safeguard sensitive participant information and the cumbersome paperwork involved in data sharing. In this article, we first provide a selective review of recent methodological developments in information integration via empirical likelihood, wherein only summary information is required, rather than the raw data. Following this, we introduce a new insight and a potentially promising framework that could broaden the application of information integration across a wider spectrum. Furthermore, this new framework offers computational convenience compared to classic empirical likelihood-based methods. We provide numerical evaluations to assess its performance and discuss various extensions in the end.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Factor Augmented Matrix Regression
Authors:
Elynn Chen,
Jianqing Fan,
Xiaonan Zhu
Abstract:
We introduce \underline{F}actor-\underline{A}ugmented \underline{Ma}trix \underline{R}egression (FAMAR) to address the growing applications of matrix-variate data and their associated challenges, particularly with high-dimensionality and covariate correlations. FAMAR encompasses two key algorithms. The first is a novel non-iterative approach that efficiently estimates the factors and loadings of t…
▽ More
We introduce \underline{F}actor-\underline{A}ugmented \underline{Ma}trix \underline{R}egression (FAMAR) to address the growing applications of matrix-variate data and their associated challenges, particularly with high-dimensionality and covariate correlations. FAMAR encompasses two key algorithms. The first is a novel non-iterative approach that efficiently estimates the factors and loadings of the matrix factor model, utilizing techniques of pre-training, diverse projection, and block-wise averaging. The second algorithm offers an accelerated solution for penalized matrix factor regression. Both algorithms are supported by established statistical and numerical convergence properties. Empirical evaluations, conducted on synthetic and real economics datasets, demonstrate FAMAR's superiority in terms of accuracy, interpretability, and computational speed. Our application to economic data showcases how matrix factors can be incorporated to predict the GDPs of the countries of interest, and the influence of these factors on the GDPs.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Distributed Tensor Principal Component Analysis
Authors:
Elynn Chen,
Xi Chen,
Wenbo **g,
Yichen Zhang
Abstract:
As tensors become widespread in modern data analysis, Tucker low-rank Principal Component Analysis (PCA) has become essential for dimensionality reduction and structural discovery in tensor datasets. Motivated by the common scenario where large-scale tensors are distributed across diverse geographic locations, this paper investigates tensor PCA within a distributed framework where direct data pool…
▽ More
As tensors become widespread in modern data analysis, Tucker low-rank Principal Component Analysis (PCA) has become essential for dimensionality reduction and structural discovery in tensor datasets. Motivated by the common scenario where large-scale tensors are distributed across diverse geographic locations, this paper investigates tensor PCA within a distributed framework where direct data pooling is impractical.
We offer a comprehensive analysis of three specific scenarios in distributed Tensor PCA: a homogeneous setting in which tensors at various locations are generated from a single noise-affected model; a heterogeneous setting where tensors at different locations come from distinct models but share some principal components, aiming to improve estimation across all locations; and a targeted heterogeneous setting, designed to boost estimation accuracy at a specific location with limited samples by utilizing transferred knowledge from other sites with ample data.
We introduce novel estimation methods tailored to each scenario, establish statistical guarantees, and develop distributed inference techniques to construct confidence regions. Our theoretical findings demonstrate that these distributed methods achieve sharp rates of accuracy by efficiently aggregating shared information across different tensors, while maintaining reasonable communication costs. Empirical validation through simulations and real-world data applications highlights the advantages of our approaches, particularly in managing heterogeneous tensor data.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Dynamic Contextual Pricing with Doubly Non-Parametric Random Utility Models
Authors:
Elynn Chen,
Xi Chen,
Lan Gao,
Jiayu Li
Abstract:
In the evolving landscape of digital commerce, adaptive dynamic pricing strategies are essential for gaining a competitive edge. This paper introduces novel {\em doubly nonparametric random utility models} that eschew traditional parametric assumptions used in estimating consumer demand's mean utility function and noise distribution. Existing nonparametric methods like multi-scale {\em Distributio…
▽ More
In the evolving landscape of digital commerce, adaptive dynamic pricing strategies are essential for gaining a competitive edge. This paper introduces novel {\em doubly nonparametric random utility models} that eschew traditional parametric assumptions used in estimating consumer demand's mean utility function and noise distribution. Existing nonparametric methods like multi-scale {\em Distributional Nearest Neighbors (DNN and TDNN)}, initially designed for offline regression, face challenges in dynamic online pricing due to design limitations, such as the indirect observability of utility-related variables and the absence of uniform convergence guarantees. We address these challenges with innovative population equations that facilitate nonparametric estimation within decision-making frameworks and establish new analytical results on the uniform convergence rates of DNN and TDNN, enhancing their applicability in dynamic environments.
Our theoretical analysis confirms that the statistical learning rates for the mean utility function and noise distribution are minimax optimal. We also derive a regret bound that illustrates the critical interaction between model dimensionality and noise distribution smoothness, deepening our understanding of dynamic pricing under varied market conditions. These contributions offer substantial theoretical insights and practical tools for implementing effective, data-driven pricing strategies, advancing the theoretical framework of pricing models and providing robust methodologies for navigating the complexities of modern markets.
△ Less
Submitted 10 June, 2024; v1 submitted 10 May, 2024;
originally announced May 2024.
-
Data-Driven Knowledge Transfer in Batch $Q^*$ Learning
Authors:
Elynn Chen,
Xi Chen,
Wenbo **g
Abstract:
In data-driven decision-making in marketing, healthcare, and education, it is desirable to utilize a large amount of data from existing ventures to navigate high-dimensional feature spaces and address data scarcity in new ventures. We explore knowledge transfer in dynamic decision-making by concentrating on batch stationary environments and formally defining task discrepancies through the lens of…
▽ More
In data-driven decision-making in marketing, healthcare, and education, it is desirable to utilize a large amount of data from existing ventures to navigate high-dimensional feature spaces and address data scarcity in new ventures. We explore knowledge transfer in dynamic decision-making by concentrating on batch stationary environments and formally defining task discrepancies through the lens of Markov decision processes (MDPs). We propose a framework of Transferred Fitted $Q$-Iteration algorithm with general function approximation, enabling the direct estimation of the optimal action-state function $Q^*$ using both target and source data. We establish the relationship between statistical performance and MDP task discrepancy under sieve approximation, shedding light on the impact of source and target sample sizes and task discrepancy on the effectiveness of knowledge transfer. We show that the final learning error of the $Q^*$ function is significantly improved from the single task rate both theoretically and empirically.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Time-Varying Matrix Factor Models
Authors:
Bin Chen,
Elynn Y. Chen,
Stevenson Bolivar,
Rong Chen
Abstract:
Matrix-variate data of high dimensions are frequently observed in finance and economics, spanning extended time periods, such as the long-term data on international trade flows among numerous countries. To address potential structural shifts and explore the matrix structure's informational context, we propose a time-varying matrix factor model. This model accommodates changing factor loadings over…
▽ More
Matrix-variate data of high dimensions are frequently observed in finance and economics, spanning extended time periods, such as the long-term data on international trade flows among numerous countries. To address potential structural shifts and explore the matrix structure's informational context, we propose a time-varying matrix factor model. This model accommodates changing factor loadings over time, revealing the underlying dynamic structure through nonparametric principal component analysis and facilitating dimension reduction. We establish the consistency and asymptotic normality of our estimators under general conditions that allow for weak correlations across time, rows, or columns of the noise. A novel approach is introduced to overcome rotational ambiguity in the estimators, enhancing the clarity and interpretability of the estimated loading matrices. Our simulation study highlights the merits of the proposed estimators and the effective of the smoothing operation. In an application to international trade flow, we investigate the trading hubs, centrality, patterns, and trends in the trading network.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Riemannian Residual Neural Networks
Authors:
Isay Katsman,
Eric Ming Chen,
Sidhanth Holalkere,
Anna Asch,
Aaron Lou,
Ser-Nam Lim,
Christopher De Sa
Abstract:
Recent methods in geometric deep learning have introduced various neural networks to operate over data that lie on Riemannian manifolds. Such networks are often necessary to learn well over graphs with a hierarchical structure or to learn over manifold-valued data encountered in the natural sciences. These networks are often inspired by and directly generalize standard Euclidean neural networks. H…
▽ More
Recent methods in geometric deep learning have introduced various neural networks to operate over data that lie on Riemannian manifolds. Such networks are often necessary to learn well over graphs with a hierarchical structure or to learn over manifold-valued data encountered in the natural sciences. These networks are often inspired by and directly generalize standard Euclidean neural networks. However, extending Euclidean networks is difficult and has only been done for a select few manifolds. In this work, we examine the residual neural network (ResNet) and show how to extend this construction to general Riemannian manifolds in a geometrically principled manner. Originally introduced to help solve the vanishing gradient problem, ResNets have become ubiquitous in machine learning due to their beneficial learning properties, excellent empirical results, and easy-to-incorporate nature when building varied neural networks. We find that our Riemannian ResNets mirror these desirable properties: when compared to existing manifold neural networks designed to learn over hyperbolic space and the manifold of symmetric positive definite matrices, we outperform both kinds of networks in terms of relevant testing metrics and training dynamics.
△ Less
Submitted 15 October, 2023;
originally announced October 2023.
-
Aggregating human judgment probabilistic predictions of COVID-19 transmission, burden, and preventative measures
Authors:
Allison Codi,
Damon Luk,
David Braun,
Juan Cambeiro,
Tamay Besiroglu,
Eva Chen,
Luis Enrique Urtubey de C`esaris,
Paolo Bocchini,
Thomas McAndrew
Abstract:
Aggregated human judgment forecasts for COVID-19 targets of public health importance are accurate, often outperforming computational models. Our work shows aggregated human judgment forecasts for infectious agents are timely, accurate, and adaptable, and can be used as tool to aid public health decision making during outbreaks.
Aggregated human judgment forecasts for COVID-19 targets of public health importance are accurate, often outperforming computational models. Our work shows aggregated human judgment forecasts for infectious agents are timely, accurate, and adaptable, and can be used as tool to aid public health decision making during outbreaks.
△ Less
Submitted 14 April, 2022; v1 submitted 5 April, 2022;
originally announced April 2022.
-
Chimeric forecasting: combining probabilistic predictions from computational models and human judgment
Authors:
Thomas McAndrew,
Allison Codi,
Juan Cambeiro,
Tamay Besiroglu,
David Braun,
Eva Chen,
Luis Enrique Urtubey de Cesaris,
Damon Luk
Abstract:
Forecasts of the trajectory of an infectious agent can help guide public health decision making. A traditional approach to forecasting fits a computational model to structured data and generates a predictive distribution. However, human judgment has access to the same data as computational models plus experience, intuition, and subjective data. We propose a chimeric ensemble -- a combination of co…
▽ More
Forecasts of the trajectory of an infectious agent can help guide public health decision making. A traditional approach to forecasting fits a computational model to structured data and generates a predictive distribution. However, human judgment has access to the same data as computational models plus experience, intuition, and subjective data. We propose a chimeric ensemble -- a combination of computational and human judgment forecasts -- as a novel approach to predicting the trajectory of an infectious agent. Each month from January, 2021 to June, 2021 we asked two generalist crowds, using the same criteria as the COVID-19 Forecast Hub, to submit a predictive distribution over incident cases and deaths at the US national level either two or three weeks into the future and combined these human judgment forecasts with forecasts from computational models submitted to the COVID-19 Forecasthub into a chimeric ensemble. We find a chimeric ensemble compared to an ensemble including only computational models improves predictions of incident cases and shows similar performance for predictions of incident deaths. A chimeric ensemble is a flexible, supportive public health tool and shows promising results for predictions of the spread of an infectious agent.
△ Less
Submitted 20 February, 2022;
originally announced February 2022.
-
Transferred Q-learning
Authors:
Elynn Y. Chen,
Michael I. Jordan,
Sai Li
Abstract:
We consider $Q$-learning with knowledge transfer, using samples from a target reinforcement learning (RL) task as well as source samples from different but related RL tasks. We propose transfer learning algorithms for both batch and online $Q$-learning with offline source studies. The proposed transferred $Q$-learning algorithm contains a novel re-targeting step that enables vertical information-c…
▽ More
We consider $Q$-learning with knowledge transfer, using samples from a target reinforcement learning (RL) task as well as source samples from different but related RL tasks. We propose transfer learning algorithms for both batch and online $Q$-learning with offline source studies. The proposed transferred $Q$-learning algorithm contains a novel re-targeting step that enables vertical information-cascading along multiple steps in an RL task, besides the usual horizontal information-gathering as transfer learning (TL) for supervised learning. We establish the first theoretical justifications of TL in RL tasks by showing a faster rate of convergence of the $Q$ function estimation in the offline RL transfer, and a lower regret bound in the offline-to-online RL transfer under certain similarity assumptions. Empirical evidences from both synthetic and real datasets are presented to back up the proposed algorithm and our theoretical results.
△ Less
Submitted 9 February, 2022;
originally announced February 2022.
-
Reinforcement Learning with Heterogeneous Data: Estimation and Inference
Authors:
Elynn Y. Chen,
Rui Song,
Michael I. Jordan
Abstract:
Reinforcement Learning (RL) has the promise of providing data-driven support for decision-making in a wide range of problems in healthcare, education, business, and other domains. Classical RL methods focus on the mean of the total return and, thus, may provide misleading results in the setting of the heterogeneous populations that commonly underlie large-scale datasets. We introduce the K-Heterog…
▽ More
Reinforcement Learning (RL) has the promise of providing data-driven support for decision-making in a wide range of problems in healthcare, education, business, and other domains. Classical RL methods focus on the mean of the total return and, thus, may provide misleading results in the setting of the heterogeneous populations that commonly underlie large-scale datasets. We introduce the K-Heterogeneous Markov Decision Process (K-Hetero MDP) to address sequential decision problems with population heterogeneity. We propose the Auto-Clustered Policy Evaluation (ACPE) for estimating the value of a given policy, and the Auto-Clustered Policy Iteration (ACPI) for estimating the optimal policy in a given policy class. Our auto-clustered algorithms can automatically detect and identify homogeneous sub-populations, while estimating the Q function and the optimal policy for each sub-population. We establish convergence rates and construct confidence intervals for the estimators obtained by the ACPE and ACPI. We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset. The latter analysis shows evidence of value heterogeneity and confirms the advantages of our new method.
△ Less
Submitted 31 January, 2022;
originally announced February 2022.
-
Truncated Rank-Based Tests for Two-Part Models with Excessive Zeros and Applications to Microbiome Data
Authors:
Wanjie Wang,
Eric Z. Chen,
Hongzhe Li
Abstract:
High-throughput sequencing technology allows us to test the compositional difference of bacteria in different populations. One important feature of human microbiome data is that it often includes a large number of zeros. Such data can be treated as being generated from a two-part model that includes a zero point-mass. Motivated by analysis of such non-negative data with excessive zeros, we introdu…
▽ More
High-throughput sequencing technology allows us to test the compositional difference of bacteria in different populations. One important feature of human microbiome data is that it often includes a large number of zeros. Such data can be treated as being generated from a two-part model that includes a zero point-mass. Motivated by analysis of such non-negative data with excessive zeros, we introduce several truncated rank-based two-group and multi-group tests for such data, including a truncated rank-based Wilcoxon rank-sum test for two-group comparison and two truncated Kruskal-Wallis tests for multi-group comparison. We show both analytically through asymptotic relative efficiency analysis and by simulations that the proposed tests have higher power than the standard rank-based tests, especially when the proportion of zeros in the data is high. The tests can also be applied to repeated measurements of compositional data via simple within-subject permutations. In a simple before-and-after treatment experiment, the within-subject permutation is similar to the paired rank test. However, the proposed tests handle the excessive zeros, which leads to a better power. We apply the tests to the analysis of a gut microbiome data set to compare the microbiome compositions of healthy and pediatric Crohn's disease patients and to assess the treatment effects on microbiome compositions. We identify several bacterial genera that are missed by the standard rank-based tests.
△ Less
Submitted 21 August, 2022; v1 submitted 11 October, 2021;
originally announced October 2021.
-
Information-theoretic Classification Accuracy: A Criterion that Guides Data-driven Combination of Ambiguous Outcome Labels in Multi-class Classification
Authors:
Chihao Zhang,
Yiling Elaine Chen,
Shihua Zhang,
**gyi Jessica Li
Abstract:
Outcome labeling ambiguity and subjectivity are ubiquitous in real-world datasets. While practitioners commonly combine ambiguous outcome labels for all data points (instances) in an ad hoc way to improve the accuracy of multi-class classification, there lacks a principled approach to guide the label combination for all data points by any optimality criterion. To address this problem, we propose t…
▽ More
Outcome labeling ambiguity and subjectivity are ubiquitous in real-world datasets. While practitioners commonly combine ambiguous outcome labels for all data points (instances) in an ad hoc way to improve the accuracy of multi-class classification, there lacks a principled approach to guide the label combination for all data points by any optimality criterion. To address this problem, we propose the information-theoretic classification accuracy (ITCA), a criterion that balances the trade-off between prediction accuracy (how well do predicted labels agree with actual labels) and classification resolution (how many labels are predictable), to guide practitioners on how to combine ambiguous outcome labels. To find the optimal label combination indicated by ITCA, we propose two search strategies: greedy search and breadth-first search. Notably, ITCA and the two search strategies are adaptive to all machine-learning classification algorithms. Coupled with a classification algorithm and a search strategy, ITCA has two uses: improving prediction accuracy and identifying ambiguous labels. We first verify that ITCA achieves high accuracy with both search strategies in finding the correct label combinations on synthetic and real data. Then we demonstrate the effectiveness of ITCA in diverse applications including medical prognosis, cancer survival prediction, user demographics prediction, and cell type classification. We also provide theoretical insights into ITCA by studying the oracle and the linear discriminant analysis classification algorithms. Python package itca (available at https://github.com/JSB-UCLA/ITCA) implements ITCA and search strategies.
△ Less
Submitted 2 July, 2022; v1 submitted 1 September, 2021;
originally announced September 2021.
-
Community Network Auto-Regression for High-Dimensional Time Series
Authors:
Elynn Y. Chen,
Jianqing Fan,
Xuening Zhu
Abstract:
Modeling responses on the nodes of a large-scale network is an important task that arises commonly in practice. This paper proposes a community network vector autoregressive (CNAR) model, which utilizes the network structure to characterize the dependence and intra-community homogeneity of the high dimensional time series. The CNAR model greatly increases the flexibility and generality of the netw…
▽ More
Modeling responses on the nodes of a large-scale network is an important task that arises commonly in practice. This paper proposes a community network vector autoregressive (CNAR) model, which utilizes the network structure to characterize the dependence and intra-community homogeneity of the high dimensional time series. The CNAR model greatly increases the flexibility and generality of the network vector autoregressive (Zhu et al, 2017, NAR) model by allowing heterogeneous network effects across different network communities. In addition, the non-community-related latent factors are included to account for unknown cross-sectional dependence. The number of network communities can diverge as the network expands, which leads to estimating a diverging number of model parameters. We obtain a set of stationary conditions and develop an efficient two-step weighted least-squares estimator. The consistency and asymptotic normality properties of the estimators are established. The theoretical results show that the two-step estimator improves the one-step estimator by an order of magnitude when the error admits a factor structure. The advantages of the CNAR model are further illustrated on a variety of synthetic and real datasets.
△ Less
Submitted 10 July, 2020;
originally announced July 2020.
-
Accuracy Prediction with Non-neural Model for Neural Architecture Search
Authors:
Renqian Luo,
Xu Tan,
Rui Wang,
Tao Qin,
Enhong Chen,
Tie-Yan Liu
Abstract:
Neural architecture search (NAS) with an accuracy predictor that predicts the accuracy of candidate architectures has drawn increasing attention due to its simplicity and effectiveness. Previous works usually employ neural network-based predictors which require more delicate design and are easy to overfit. Considering that most architectures are represented as sequences of discrete symbols which a…
▽ More
Neural architecture search (NAS) with an accuracy predictor that predicts the accuracy of candidate architectures has drawn increasing attention due to its simplicity and effectiveness. Previous works usually employ neural network-based predictors which require more delicate design and are easy to overfit. Considering that most architectures are represented as sequences of discrete symbols which are more like tabular data and preferred by non-neural predictors, in this paper, we study an alternative approach which uses non-neural model for accuracy prediction. Specifically, as decision tree based models can better handle tabular data, we leverage gradient boosting decision tree (GBDT) as the predictor for NAS. We demonstrate that the GBDT predictor can achieve comparable (if not better) prediction accuracy than neural network based predictors. Moreover, considering that a compact search space can ease the search process, we propose to prune the search space gradually according to important features derived from GBDT. In this way, NAS can be performed by first pruning the search space and then searching a neural architecture, which is more efficient and effective. Experiments on NASBench-101 and ImageNet demonstrate the effectiveness of using GBDT as predictor for NAS: (1) On NASBench-101, it is 22x, 8x, and 6x more sample efficient than random search, regularized evolution, and Monte Carlo Tree Search (MCTS) in finding the global optimum; (2) It achieves 24.2% top-1 error rate on ImageNet, and further achieves 23.4% top-1 error rate on ImageNet when enhanced with search space pruning. Code is provided at https://github.com/renqianluo/GBDT-NAS.
△ Less
Submitted 19 July, 2021; v1 submitted 9 July, 2020;
originally announced July 2020.
-
ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction
Authors:
Zhongkai Hao,
Chengqiang Lu,
Zheyuan Hu,
Hao Wang,
Zhenya Huang,
Qi Liu,
Enhong Chen,
Cheekong Lee
Abstract:
Molecular property prediction (e.g., energy) is an essential problem in chemistry and biology. Unfortunately, many supervised learning methods usually suffer from the problem of scarce labeled molecules in the chemical space, where such property labels are generally obtained by Density Functional Theory (DFT) calculation which is extremely computational costly. An effective solution is to incorpor…
▽ More
Molecular property prediction (e.g., energy) is an essential problem in chemistry and biology. Unfortunately, many supervised learning methods usually suffer from the problem of scarce labeled molecules in the chemical space, where such property labels are generally obtained by Density Functional Theory (DFT) calculation which is extremely computational costly. An effective solution is to incorporate the unlabeled molecules in a semi-supervised fashion. However, learning semi-supervised representation for large amounts of molecules is challenging, including the joint representation issue of both molecular essence and structure, the conflict between representation and property leaning. Here we propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules. Specifically, ASGN adopts a teacher-student framework. In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution. Then in the student model, we target at property prediction task to deal with the learning loss conflict. At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning. We conduct extensive experiments on several public datasets. Experimental results show the remarkable performance of our ASGN framework.
△ Less
Submitted 7 July, 2020;
originally announced July 2020.
-
Semi-parametric TEnsor Factor Analysis by Iteratively Projected Singular Value Decomposition
Authors:
Elynn Y. Chen,
Dong Xia,
Chencheng Cai,
Jianqing Fan
Abstract:
This paper introduces a general framework of Semi-parametric TEnsor Factor Analysis (STEFA) that focuses on the methodology and theory of low-rank tensor decomposition with auxiliary covariates. Semi-parametric TEnsor Factor Analysis models extend tensor factor models by incorporating auxiliary covariates in the loading matrices. We propose an algorithm of iteratively projected singular value deco…
▽ More
This paper introduces a general framework of Semi-parametric TEnsor Factor Analysis (STEFA) that focuses on the methodology and theory of low-rank tensor decomposition with auxiliary covariates. Semi-parametric TEnsor Factor Analysis models extend tensor factor models by incorporating auxiliary covariates in the loading matrices. We propose an algorithm of iteratively projected singular value decomposition (IP-SVD) for the semi-parametric estimation. It iteratively projects tensor data onto the linear space spanned by the basis functions of covariates and applies singular value decomposition on matricized tensors over each mode. We establish the convergence rates of the loading matrices and the core tensor factor. The theoretical results only require a sub-exponential noise distribution, which is weaker than the assumption of sub-Gaussian tail of noise in the literature. Compared with the Tucker decomposition, IP-SVD yields more accurate estimators with a faster convergence rate. Besides estimation, we propose several prediction methods with new covariates based on the STEFA model. On both synthetic and real tensor data, we demonstrate the efficacy of the STEFA model and the IP-SVD algorithm on both the estimation and prediction tasks.
△ Less
Submitted 2 April, 2024; v1 submitted 5 July, 2020;
originally announced July 2020.
-
Diagnosis Prevalence vs. Efficacy in Machine-learning Based Diagnostic Decision Support
Authors:
Gil Alon,
Elizabeth Chen,
Guergana Savova,
Carsten Eickhoff
Abstract:
Many recent studies use machine learning to predict a small number of ICD-9-CM codes. In practice, on the other hand, physicians have to consider a broader range of diagnoses. This study aims to put these previously incongruent evaluation settings on a more equal footing by predicting ICD-9-CM codes based on electronic health record properties and demonstrating the relationship between diagnosis p…
▽ More
Many recent studies use machine learning to predict a small number of ICD-9-CM codes. In practice, on the other hand, physicians have to consider a broader range of diagnoses. This study aims to put these previously incongruent evaluation settings on a more equal footing by predicting ICD-9-CM codes based on electronic health record properties and demonstrating the relationship between diagnosis prevalence and system performance. We extracted patient features from the MIMIC-III dataset for each admission. We trained and evaluated 43 different machine learning classifiers. Among this pool, the most successful classifier was a Multi-Layer Perceptron. In accordance with general machine learning expectation, we observed all classifiers' F1 scores to drop as disease prevalence decreased. Scores fell from 0.28 for the 50 most prevalent ICD-9-CM codes to 0.03 for the 1000 most prevalent ICD-9-CM codes. Statistical analyses showed a moderate positive correlation between disease prevalence and efficacy (0.5866).
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
On Projection Robust Optimal Transport: Sample Complexity and Model Misspecification
Authors:
Tianyi Lin,
Zeyu Zheng,
Elynn Y. Chen,
Marco Cuturi,
Michael I. Jordan
Abstract:
Optimal transport (OT) distances are increasingly used as loss functions for statistical inference, notably in the learning of generative models or supervised learning. Yet, the behavior of minimum Wasserstein estimators is poorly understood, notably in high-dimensional regimes or under model misspecification. In this work we adopt the viewpoint of projection robust (PR) OT, which seeks to maximiz…
▽ More
Optimal transport (OT) distances are increasingly used as loss functions for statistical inference, notably in the learning of generative models or supervised learning. Yet, the behavior of minimum Wasserstein estimators is poorly understood, notably in high-dimensional regimes or under model misspecification. In this work we adopt the viewpoint of projection robust (PR) OT, which seeks to maximize the OT cost between two measures by choosing a $k$-dimensional subspace onto which they can be projected. Our first contribution is to establish several fundamental statistical properties of PR Wasserstein distances, complementing and improving previous literature that has been restricted to one-dimensional and well-specified cases. Next, we propose the integral PR Wasserstein (IPRW) distance as an alternative to the PRW distance, by averaging rather than optimizing on subspaces. Our complexity bounds can help explain why both PRW and IPRW distances outperform Wasserstein distances empirically in high-dimensional inference tasks. Finally, we consider parametric inference using the PRW distance. We provide an asymptotic guarantee of two types of minimum PRW estimators and formulate a central limit theorem for max-sliced Wasserstein estimator under model misspecification. To enable our analysis on PRW with projection dimension larger than one, we devise a novel combination of variational analysis and statistical theory.
△ Less
Submitted 17 July, 2021; v1 submitted 22 June, 2020;
originally announced June 2020.
-
Median regression with differential privacy
Authors:
E Chen,
Ying Miao,
Yu Tang
Abstract:
Median regression analysis has robustness properties which make it attractive compared with regression based on the mean, while differential privacy can protect individual privacy during statistical analysis of certain datasets. In this paper, three privacy preserving methods are proposed for median regression. The first algorithm is based on a finite smoothing method, the second provides an itera…
▽ More
Median regression analysis has robustness properties which make it attractive compared with regression based on the mean, while differential privacy can protect individual privacy during statistical analysis of certain datasets. In this paper, three privacy preserving methods are proposed for median regression. The first algorithm is based on a finite smoothing method, the second provides an iterative way and the last one further employs the greedy coordinate descent approach. Privacy preserving properties of these three methods are all proved. Accuracy bound or convergence properties of these algorithms are also provided. Numerical calculation shows that the first method has better accuracy than the others when the sample size is small. When the sample size becomes larger, the first method needs more time while the second method needs less time with well-matched accuracy. For the third method, it costs less time in both cases, while it highly depends on step size.
△ Less
Submitted 4 June, 2020;
originally announced June 2020.
-
Semi-Supervised Neural Architecture Search
Authors:
Renqian Luo,
Xu Tan,
Rui Wang,
Tao Qin,
Enhong Chen,
Tie-Yan Liu
Abstract:
Neural architecture search (NAS) relies on a good controller to generate better architectures or predict the accuracy of given architectures. However, training the controller requires both abundant and high-quality pairs of architectures and their accuracy, while it is costly to evaluate an architecture and obtain its accuracy. In this paper, we propose SemiNAS, a semi-supervised NAS approach that…
▽ More
Neural architecture search (NAS) relies on a good controller to generate better architectures or predict the accuracy of given architectures. However, training the controller requires both abundant and high-quality pairs of architectures and their accuracy, while it is costly to evaluate an architecture and obtain its accuracy. In this paper, we propose SemiNAS, a semi-supervised NAS approach that leverages numerous unlabeled architectures (without evaluation and thus nearly no cost). Specifically, SemiNAS 1) trains an initial accuracy predictor with a small set of architecture-accuracy data pairs; 2) uses the trained accuracy predictor to predict the accuracy of large amount of architectures (without evaluation); and 3) adds the generated data pairs to the original data to further improve the predictor. The trained accuracy predictor can be applied to various NAS algorithms by predicting the accuracy of candidate architectures for them. SemiNAS has two advantages: 1) It reduces the computational cost under the same accuracy guarantee. On NASBench-101 benchmark dataset, it achieves comparable accuracy with gradient-based method while using only 1/7 architecture-accuracy pairs. 2) It achieves higher accuracy under the same computational cost. It achieves 94.02% test accuracy on NASBench-101, outperforming all the baselines when using the same number of architectures. On ImageNet, it achieves 23.5% top-1 error rate (under 600M FLOPS constraint) using 4 GPU-days for search. We further apply it to LJSpeech text to speech task and it achieves 97% intelligibility rate in the low-resource setting and 15% test error rate in the robustness setting, with 9%, 7% improvements over the baseline respectively.
△ Less
Submitted 3 November, 2020; v1 submitted 24 February, 2020;
originally announced February 2020.
-
Modeling Multivariate Spatial-Temporal Data with Latent Low-Dimensional Dynamics
Authors:
Elynn Y. Chen,
Xin Yun,
Rong Chen,
Qiwei Yao
Abstract:
High-dimensional multivariate spatial-temporal data arise frequently in a wide range of applications; however, there are relatively few statistical methods that can simultaneously deal with spatial, temporal and variable-wise dependencies in large data sets. In this paper, we propose a new approach to utilize the correlations in variable, space and time to achieve dimension reduction and to facili…
▽ More
High-dimensional multivariate spatial-temporal data arise frequently in a wide range of applications; however, there are relatively few statistical methods that can simultaneously deal with spatial, temporal and variable-wise dependencies in large data sets. In this paper, we propose a new approach to utilize the correlations in variable, space and time to achieve dimension reduction and to facilitate spatial/temporal predictions in the high-dimensional settings. The multivariate spatial-temporal process is represented as a linear transformation of a lower-dimensional latent factor process. The spatial dependence structure of the factor process is further represented non-parametrically in terms of latent empirical orthogonal functions. The low-dimensional structure is completely unknown in our setting and is learned entirely from data collected irregularly over space but regularly over time. We propose innovative estimation and prediction methods based on the latent low-rank structures. Asymptotic properties of the estimators and predictors are established. Extensive experiments on synthetic and real data sets show that, while the dimensions are reduced significantly, the spatial, temporal and variable-wise covariance structures are largely preserved. The efficacy of our method is further confirmed by the prediction performances on both synthetic and real data sets.
△ Less
Submitted 1 February, 2020;
originally announced February 2020.
-
Deep Technology Tracing for High-tech Companies
Authors:
Han Wu,
Kun Zhang,
Guangyi Lv,
Qi Liu,
Runlong Yu,
Weihao Zhao,
Enhong Chen,
Jianhui Ma
Abstract:
Technological change and innovation are vitally important, especially for high-tech companies. However, factors influencing their future research and development (R&D) trends are both complicated and various, leading it a quite difficult task to make technology tracing for high-tech companies. To this end, in this paper, we develop a novel data-driven solution, i.e., Deep Technology Forecasting (D…
▽ More
Technological change and innovation are vitally important, especially for high-tech companies. However, factors influencing their future research and development (R&D) trends are both complicated and various, leading it a quite difficult task to make technology tracing for high-tech companies. To this end, in this paper, we develop a novel data-driven solution, i.e., Deep Technology Forecasting (DTF) framework, to automatically find the most possible technology directions customized to each high-tech company. Specially, DTF consists of three components: Potential Competitor Recognition (PCR), Collaborative Technology Recognition (CTR), and Deep Technology Tracing (DTT) neural network. For one thing, PCR and CTR aim to capture competitive relations among enterprises and collaborative relations among technologies, respectively. For another, DTT is designed for modeling dynamic interactions between companies and technologies with the above relations involved. Finally, we evaluate our DTF framework on real-world patent data, and the experimental results clearly prove that DTF can precisely help to prospect future technology emphasis of companies by exploiting hybrid factors.
△ Less
Submitted 2 January, 2020;
originally announced January 2020.
-
Statistical Inference for High-Dimensional Matrix-Variate Factor Model
Authors:
Elynn Y. Chen,
Jianqing Fan
Abstract:
This paper considers the estimation and inference of the low-rank components in high-dimensional matrix-variate factor models, where each dimension of the matrix-variates ($p \times q$) is comparable to or greater than the number of observations ($T$). We propose an estimation method called $α$-PCA that preserves the matrix structure and aggregates mean and contemporary covariance through a hyper-…
▽ More
This paper considers the estimation and inference of the low-rank components in high-dimensional matrix-variate factor models, where each dimension of the matrix-variates ($p \times q$) is comparable to or greater than the number of observations ($T$). We propose an estimation method called $α$-PCA that preserves the matrix structure and aggregates mean and contemporary covariance through a hyper-parameter $α$. We develop an inferential theory, establishing consistency, the rate of convergence, and the limiting distributions, under general conditions that allow for correlations across time, rows, or columns of the noise. We show both theoretical and empirical methods of choosing the best $α$, depending on the use-case criteria. Simulation results demonstrate the adequacy of the asymptotic results in approximating the finite sample properties. The $α$-PCA compares favorably with the existing ones. Finally, we illustrate its applications with a real numeric data set and two real image data sets. In all applications, the proposed estimation procedure outperforms previous methods in the power of variance explanation using out-of-sample 10-fold cross-validation.
△ Less
Submitted 19 October, 2022; v1 submitted 7 January, 2020;
originally announced January 2020.
-
Variance Reduced Local SGD with Lower Communication Complexity
Authors:
Xianfeng Liang,
Shuheng Shen,
**gchang Liu,
Zhen Pan,
Enhong Chen,
Yifei Cheng
Abstract:
To accelerate the training of machine learning models, distributed stochastic gradient descent (SGD) and its variants have been widely adopted, which apply multiple workers in parallel to speed up training. Among them, Local SGD has gained much attention due to its lower communication cost. Nevertheless, when the data distribution on workers is non-identical, Local SGD requires…
▽ More
To accelerate the training of machine learning models, distributed stochastic gradient descent (SGD) and its variants have been widely adopted, which apply multiple workers in parallel to speed up training. Among them, Local SGD has gained much attention due to its lower communication cost. Nevertheless, when the data distribution on workers is non-identical, Local SGD requires $O(T^{\frac{3}{4}} N^{\frac{3}{4}})$ communications to maintain its \emph{linear iteration speedup} property, where $T$ is the total number of iterations and $N$ is the number of workers. In this paper, we propose Variance Reduced Local SGD (VRL-SGD) to further reduce the communication complexity. Benefiting from eliminating the dependency on the gradient variance among workers, we theoretically prove that VRL-SGD achieves a \emph{linear iteration speedup} with a lower communication complexity $O(T^{\frac{1}{2}} N^{\frac{3}{2}})$ even if workers access non-identical datasets. We conduct experiments on three machine learning tasks, and the experimental results demonstrate that VRL-SGD performs impressively better than Local SGD when the data among workers are quite diverse.
△ Less
Submitted 30 December, 2019;
originally announced December 2019.
-
Estimating Early Fundraising Performance of Innovations via Graph-based Market Environment Model
Authors:
Likang Wu,
Zhi Li,
Hongke Zhao,
Zhen Pan,
Qi Liu,
Enhong Chen
Abstract:
Well begun is half done. In the crowdfunding market, the early fundraising performance of the project is a concerned issue for both creators and platforms. However, estimating the early fundraising performance before the project published is very challenging and still under-explored. To that end, in this paper, we present a focused study on this important problem in a market modeling view. Specifi…
▽ More
Well begun is half done. In the crowdfunding market, the early fundraising performance of the project is a concerned issue for both creators and platforms. However, estimating the early fundraising performance before the project published is very challenging and still under-explored. To that end, in this paper, we present a focused study on this important problem in a market modeling view. Specifically, we propose a Graph-based Market Environment model (GME) for estimating the early fundraising performance of the target project by exploiting the market environment. In addition, we discriminatively model the market competition and market evolution by designing two graph-based neural network architectures and incorporating them into the joint optimization stage. Finally, we conduct extensive experiments on the real-world crowdfunding data collected from Indiegogo.com. The experimental results clearly demonstrate the effectiveness of our proposed model for modeling and estimating the early fundraising performance of the target project.
△ Less
Submitted 13 December, 2019;
originally announced December 2019.
-
Pyramid Convolutional RNN for MRI Image Reconstruction
Authors:
Eric Z. Chen,
Puyang Wang,
Xiao Chen,
Terrence Chen,
Shanhui Sun
Abstract:
Fast and accurate MRI image reconstruction from undersampled data is crucial in clinical practice. Deep learning based reconstruction methods have shown promising advances in recent years. However, recovering fine details from undersampled data is still challenging. In this paper, we introduce a novel deep learning based method, Pyramid Convolutional RNN (PC-RNN), to reconstruct images from multip…
▽ More
Fast and accurate MRI image reconstruction from undersampled data is crucial in clinical practice. Deep learning based reconstruction methods have shown promising advances in recent years. However, recovering fine details from undersampled data is still challenging. In this paper, we introduce a novel deep learning based method, Pyramid Convolutional RNN (PC-RNN), to reconstruct images from multiple scales. Based on the formulation of MRI reconstruction as an inverse problem, we design the PC-RNN model with three convolutional RNN (ConvRNN) modules to iteratively learn the features in multiple scales. Each ConvRNN module reconstructs images at different scales and the reconstructed images are combined by a final CNN module in a pyramid fashion. The multi-scale ConvRNN modules learn a coarse-to-fine image reconstruction. Unlike other common reconstruction methods for parallel imaging, PC-RNN does not employ coil sensitive maps for multi-coil data and directly model the multiple coils as multi-channel inputs. The coil compression technique is applied to standardize data with various coil numbers, leading to more efficient training. We evaluate our model on the fastMRI knee and brain datasets and the results show that the proposed model outperforms other methods and can recover more details. The proposed method is one of the winner solutions in the 2019 fastMRI competition.
△ Less
Submitted 21 February, 2022; v1 submitted 1 December, 2019;
originally announced December 2019.
-
Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation
Authors:
Junliang Guo,
Xu Tan,
Linli Xu,
Tao Qin,
Enhong Chen,
Tie-Yan Liu
Abstract:
Non-autoregressive translation (NAT) models remove the dependence on previous target tokens and generate all target tokens in parallel, resulting in significant inference speedup but at the cost of inferior translation accuracy compared to autoregressive translation (AT) models. Considering that AT models have higher accuracy and are easier to train than NAT models, and both of them share the same…
▽ More
Non-autoregressive translation (NAT) models remove the dependence on previous target tokens and generate all target tokens in parallel, resulting in significant inference speedup but at the cost of inferior translation accuracy compared to autoregressive translation (AT) models. Considering that AT models have higher accuracy and are easier to train than NAT models, and both of them share the same model configurations, a natural idea to improve the accuracy of NAT models is to transfer a well-trained AT model to an NAT model through fine-tuning. However, since AT and NAT models differ greatly in training strategy, straightforward fine-tuning does not work well. In this work, we introduce curriculum learning into fine-tuning for NAT. Specifically, we design a curriculum in the fine-tuning process to progressively switch the training from autoregressive generation to non-autoregressive generation. Experiments on four benchmark translation datasets show that the proposed method achieves good improvement (more than $1$ BLEU score) over previous NAT baselines in terms of translation accuracy, and greatly speed up (more than $10$ times) the inference process over AT baselines.
△ Less
Submitted 21 November, 2019; v1 submitted 20 November, 2019;
originally announced November 2019.
-
Balanced One-shot Neural Architecture Optimization
Authors:
Renqian Luo,
Tao Qin,
Enhong Chen
Abstract:
The ability to rank candidate architectures is the key to the performance of neural architecture search~(NAS). One-shot NAS is proposed to reduce the expense but shows inferior performance against conventional NAS and is not adequately stable. We investigate into this and find that the ranking correlation between architectures under one-shot training and the ones under stand-alone full training is…
▽ More
The ability to rank candidate architectures is the key to the performance of neural architecture search~(NAS). One-shot NAS is proposed to reduce the expense but shows inferior performance against conventional NAS and is not adequately stable. We investigate into this and find that the ranking correlation between architectures under one-shot training and the ones under stand-alone full training is poor, which misleads the algorithm to discover better architectures. Further, we show that the training of architectures of different sizes under the current one-shot method is imbalanced, which causes the evaluated performances of the architectures to be less predictable of their ground-truth performances and affects the ranking correlation heavily. Consequently, we propose Balanced NAO where we introduce balanced training of the supernet during the search procedure to encourage more updates for large architectures than small architectures by sampling architectures in proportion to their model sizes. Comprehensive experiments verify that our proposed method is effective and robust which leads to a more stable search. The final discovered architecture shows significant improvements against baselines with a test error rate of 2.60\% on CIFAR-10 and top-1 accuracy of 74.4% on ImageNet under the mobile setting. Code and model checkpoints will be publicly available. The code is available at github.com/renqianluo/NAO_pytorch.
△ Less
Submitted 31 March, 2020; v1 submitted 24 September, 2019;
originally announced September 2019.
-
Neural Cognitive Diagnosis for Intelligent Education Systems
Authors:
Fei Wang,
Qi Liu,
Enhong Chen,
Zhenya Huang,
Yuying Chen,
Yu Yin,
Zai Huang,
Shi** Wang
Abstract:
Cognitive diagnosis is a fundamental issue in intelligent education, which aims to discover the proficiency level of students on specific knowledge concepts. Existing approaches usually mine linear interactions of student exercising process by manual-designed function (e.g., logistic function), which is not sufficient for capturing complex relations between students and exercises. In this paper, w…
▽ More
Cognitive diagnosis is a fundamental issue in intelligent education, which aims to discover the proficiency level of students on specific knowledge concepts. Existing approaches usually mine linear interactions of student exercising process by manual-designed function (e.g., logistic function), which is not sufficient for capturing complex relations between students and exercises. In this paper, we propose a general Neural Cognitive Diagnosis (NeuralCD) framework, which incorporates neural networks to learn the complex exercising interactions, for getting both accurate and interpretable diagnosis results. Specifically, we project students and exercises to factor vectors and leverage multi neural layers for modeling their interactions, where the monotonicity assumption is applied to ensure the interpretability of both factors. Furthermore, we propose two implementations of NeuralCD by specializing the required concepts of each exercise, i.e., the NeuralCDM with traditional Q-matrix and the improved NeuralCDM+ exploring the rich text content. Extensive experimental results on real-world datasets show the effectiveness of NeuralCD framework with both accuracy and interpretability.
△ Less
Submitted 3 March, 2020; v1 submitted 23 August, 2019;
originally announced August 2019.
-
Exploiting Cognitive Structure for Adaptive Learning
Authors:
Qi Liu,
Shiwei Tong,
Chuanren Liu,
Hongke Zhao,
Enhong Chen,
Hai** Ma,
Shi** Wang
Abstract:
Adaptive learning, also known as adaptive teaching, relies on learning path recommendation, which sequentially recommends personalized learning items (e.g., lectures, exercises) to satisfy the unique needs of each learner. Although it is well known that modeling the cognitive structure including knowledge level of learners and knowledge structure (e.g., the prerequisite relations) of learning item…
▽ More
Adaptive learning, also known as adaptive teaching, relies on learning path recommendation, which sequentially recommends personalized learning items (e.g., lectures, exercises) to satisfy the unique needs of each learner. Although it is well known that modeling the cognitive structure including knowledge level of learners and knowledge structure (e.g., the prerequisite relations) of learning items is important for learning path recommendation, existing methods for adaptive learning often separately focus on either knowledge levels of learners or knowledge structure of learning items. To fully exploit the multifaceted cognitive structure for learning path recommendation, we propose a Cognitive Structure Enhanced framework for Adaptive Learning, named CSEAL. By viewing path recommendation as a Markov Decision Process and applying an actor-critic algorithm, CSEAL can sequentially identify the right learning items to different learners. Specifically, we first utilize a recurrent neural network to trace the evolving knowledge levels of learners at each learning step. Then, we design a navigation algorithm on the knowledge structure to ensure the logicality of learning paths, which reduces the search space in the decision process. Finally, the actor-critic algorithm is used to determine what to learn next and whose parameters are dynamically updated along the learning path. Extensive experiments on real-world data demonstrate the effectiveness and robustness of CSEAL.
△ Less
Submitted 23 May, 2019;
originally announced May 2019.
-
Transcribing Content from Structural Images with Spotlight Mechanism
Authors:
Yu Yin,
Zhenya Huang,
Enhong Chen,
Qi Liu,
Fuzheng Zhang,
Xing Xie,
Guo** Hu
Abstract:
Transcribing content from structural images, e.g., writing notes from music scores, is a challenging task as not only the content objects should be recognized, but the internal structure should also be preserved. Existing image recognition methods mainly work on images with simple content (e.g., text lines with characters), but are not capable to identify ones with more complex content (e.g., stru…
▽ More
Transcribing content from structural images, e.g., writing notes from music scores, is a challenging task as not only the content objects should be recognized, but the internal structure should also be preserved. Existing image recognition methods mainly work on images with simple content (e.g., text lines with characters), but are not capable to identify ones with more complex content (e.g., structured symbols), which often follow a fine-grained grammar. To this end, in this paper, we propose a hierarchical Spotlight Transcribing Network (STN) framework followed by a two-stage "where-to-what" solution. Specifically, we first decide "where-to-look" through a novel spotlight mechanism to focus on different areas of the original image following its structure. Then, we decide "what-to-write" by develo** a GRU based network with the spotlight areas for transcribing the content accordingly. Moreover, we propose two implementations on the basis of STN, i.e., STNM and STNR, where the spotlight movement follows the Markov property and Recurrent modeling, respectively. We also design a reinforcement method to refine the framework by self-improving the spotlight mechanism. We conduct extensive experiments on many structural image datasets, where the results clearly demonstrate the effectiveness of STN framework.
△ Less
Submitted 26 May, 2019;
originally announced May 2019.
-
QuesNet: A Unified Representation for Heterogeneous Test Questions
Authors:
Yu Yin,
Qi Liu,
Zhenya Huang,
Enhong Chen,
Wei Tong,
Shi** Wang,
Yu Su
Abstract:
Understanding learning materials (e.g. test questions) is a crucial issue in online learning systems, which can promote many applications in education domain. Unfortunately, many supervised approaches suffer from the problem of scarce human labeled data, whereas abundant unlabeled resources are highly underutilized. To alleviate this problem, an effective solution is to use pre-trained representat…
▽ More
Understanding learning materials (e.g. test questions) is a crucial issue in online learning systems, which can promote many applications in education domain. Unfortunately, many supervised approaches suffer from the problem of scarce human labeled data, whereas abundant unlabeled resources are highly underutilized. To alleviate this problem, an effective solution is to use pre-trained representations for question understanding. However, existing pre-training methods in NLP area are infeasible to learn test question representations due to several domain-specific characteristics in education. First, questions usually comprise of heterogeneous data including content text, images and side information. Second, there exists both basic linguistic information as well as domain logic and knowledge. To this end, in this paper, we propose a novel pre-training method, namely QuesNet, for comprehensively learning question representations. Specifically, we first design a unified framework to aggregate question information with its heterogeneous inputs into a comprehensive vector. Then we propose a two-level hierarchical pre-training algorithm to learn better understanding of test questions in an unsupervised way. Here, a novel holed language model objective is developed to extract low-level linguistic features, and a domain-oriented objective is proposed to learn high-level logic and knowledge. Moreover, we show that QuesNet has good capability of being fine-tuned in many question-based tasks. We conduct extensive experiments on large-scale real-world question data, where the experimental results clearly demonstrate the effectiveness of QuesNet for question understanding as well as its superior applicability.
△ Less
Submitted 26 May, 2019;
originally announced May 2019.
-
Low-Rank Principal Eigenmatrix Analysis
Authors:
Krishna Balasubramanian,
Elynn Y. Chen,
Jianqing Fan,
Xiang Wu
Abstract:
Sparse PCA is a widely used technique for high-dimensional data analysis. In this paper, we propose a new method called low-rank principal eigenmatrix analysis. Different from sparse PCA, the dominant eigenvectors are allowed to be dense but are assumed to have a low-rank structure when matricized appropriately. Such a structure arises naturally in several practical cases: Indeed the top eigenvect…
▽ More
Sparse PCA is a widely used technique for high-dimensional data analysis. In this paper, we propose a new method called low-rank principal eigenmatrix analysis. Different from sparse PCA, the dominant eigenvectors are allowed to be dense but are assumed to have a low-rank structure when matricized appropriately. Such a structure arises naturally in several practical cases: Indeed the top eigenvector of a circulant matrix, when matricized appropriately is a rank-1 matrix. We propose a matricized rank-truncated power method that could be efficiently implemented and establish its computational and statistical properties. Extensive experiments on several synthetic data sets demonstrate the competitive empirical performance of our method.
△ Less
Submitted 28 April, 2019;
originally announced April 2019.
-
Hel** Effects Against Curse of Dimensionality in Threshold Factor Models for Matrix Time Series
Authors:
Xialu Liu,
Elynn Chen
Abstract:
As is known, factor analysis is a popular method to reduce dimension for high-dimensional data. For matrix data, the dimension reduction can be more effectively achieved through both row and column directions. In this paper, we introduce a threshold factor models to analyze matrix-valued high-dimensional time series data. The factor loadings are allowed to switch between regimes, controlling by a…
▽ More
As is known, factor analysis is a popular method to reduce dimension for high-dimensional data. For matrix data, the dimension reduction can be more effectively achieved through both row and column directions. In this paper, we introduce a threshold factor models to analyze matrix-valued high-dimensional time series data. The factor loadings are allowed to switch between regimes, controlling by a threshold variable. The estimation methods for loading spaces, threshold value, and the number of factors are proposed. The asymptotic properties of these estimators are investigated. Not only the strengths of thresholding and factors, but also their interactions from different directions and different regimes play an important role on the estimation performance. When the thresholding and factors are all strong across regimes, the estimation is immune to the impact that the increase of dimension brings, which breaks the curse of dimensionality. When the thresholding in two directions and factors across regimes have different levels of strength, we show that estimators for loadings and threshold value experience 'hel**' effects against the curse of dimensionality. We also discover that even when the numbers of factors are overestimated, the estimators are still consistent. The proposed methods are illustrated with both simulated and real examples.
△ Less
Submitted 15 April, 2019;
originally announced April 2019.
-
An Evaluation of the Human-Interpretability of Explanation
Authors:
Isaac Lage,
Emily Chen,
Jeffrey He,
Menaka Narayanan,
Been Kim,
Sam Gershman,
Finale Doshi-Velez
Abstract:
Recent years have seen a boom in interest in machine learning systems that can provide a human-understandable rationale for their predictions or decisions. However, exactly what kinds of explanation are truly human-interpretable remains poorly understood. This work advances our understanding of what makes explanations interpretable under three specific tasks that users may perform with machine lea…
▽ More
Recent years have seen a boom in interest in machine learning systems that can provide a human-understandable rationale for their predictions or decisions. However, exactly what kinds of explanation are truly human-interpretable remains poorly understood. This work advances our understanding of what makes explanations interpretable under three specific tasks that users may perform with machine learning systems: simulation of the response, verification of a suggested response, and determining whether the correctness of a suggested response changes under a change to the inputs. Through carefully controlled human-subject experiments, we identify regularizers that can be used to optimize for the interpretability of machine learning systems. Our results show that the type of complexity matters: cognitive chunks (newly defined concepts) affect performance more than variable repetitions, and these trends are consistent across tasks and domains. This suggests that there may exist some common design principles for explanation systems.
△ Less
Submitted 28 August, 2019; v1 submitted 30 January, 2019;
originally announced February 2019.
-
Modeling Dynamic Transport Network with Matrix Factor Models: with an Application to International Trade Flow
Authors:
Elynn Y. Chen,
Rong Chen
Abstract:
International trade research plays an important role to inform trade policy and shed light on wider issues relating to poverty, development, migration, productivity, and economy. With recent advances in information technology, global and regional agencies distribute an enormous amount of internationally comparable trading data among a large number of countries over time, providing a goldmine for e…
▽ More
International trade research plays an important role to inform trade policy and shed light on wider issues relating to poverty, development, migration, productivity, and economy. With recent advances in information technology, global and regional agencies distribute an enormous amount of internationally comparable trading data among a large number of countries over time, providing a goldmine for empirical analysis of international trade. Meanwhile, an array of new statistical methods are recently developed for dynamic network analysis. However, these advanced methods have not been utilized for analyzing such massive dynamic cross-country trading data. International trade data can be viewed as a dynamic transport network because it emphasizes the amount of goods moving across a network. Most literature on dynamic network analysis concentrates on the connectivity network that focuses on link formation or deformation rather than the transport moving across the network. We take a different perspective from the pervasive node-and-edge level modeling: the dynamic transport network is modeled as a time series of relational matrices. We adopt a matrix factor model of \cite{wang2018factor}, with a specific interpretation for the dynamic transport network. Under the model, the observed surface network is assumed to be driven by a latent dynamic transport network with lower dimensions. The proposed method is able to unveil the latent dynamic structure and achieve the objective of dimension reduction. We applied the proposed framework and methodology to a data set of monthly trading volumes among 24 countries and regions from 1982 to 2015. Our findings shed light on trading hubs, centrality, trends and patterns of international trade and show matching change points to trading policies. The dataset also provides a fertile ground for future research on international trade.
△ Less
Submitted 2 January, 2019;
originally announced January 2019.
-
Neural Architecture Optimization
Authors:
Renqian Luo,
Fei Tian,
Tao Qin,
Enhong Chen,
Tie-Yan Liu
Abstract:
Automatic neural architecture design has shown its potential in discovering powerful neural network architectures. Existing methods, no matter based on reinforcement learning or evolutionary algorithms (EA), conduct architecture search in a discrete space, which is highly inefficient. In this paper, we propose a simple and efficient method to automatic neural architecture design based on continuou…
▽ More
Automatic neural architecture design has shown its potential in discovering powerful neural network architectures. Existing methods, no matter based on reinforcement learning or evolutionary algorithms (EA), conduct architecture search in a discrete space, which is highly inefficient. In this paper, we propose a simple and efficient method to automatic neural architecture design based on continuous optimization. We call this new approach neural architecture optimization (NAO). There are three key components in our proposed approach: (1) An encoder embeds/maps neural network architectures into a continuous space. (2) A predictor takes the continuous representation of a network as input and predicts its accuracy. (3) A decoder maps a continuous representation of a network back to its architecture. The performance predictor and the encoder enable us to perform gradient based optimization in the continuous space to find the embedding of a new architecture with potentially better accuracy. Such a better embedding is then decoded to a network by the decoder. Experiments show that the architecture discovered by our method is very competitive for image classification task on CIFAR-10 and language modeling task on PTB, outperforming or on par with the best results of previous architecture search methods with a significantly reduction of computational resources. Specifically we obtain 1.93% test set error rate for CIFAR-10 image classification task and 56.0 test set perplexity of PTB language modeling task. Furthermore, combined with the recent proposed weight sharing mechanism, we discover powerful architecture on CIFAR-10 (with error rate 2.93%) and on PTB (with test set perplexity 56.6), with very limited computational resources (less than 10 GPU hours) for both tasks.
△ Less
Submitted 4 September, 2019; v1 submitted 22 August, 2018;
originally announced August 2018.
-
Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions
Authors:
Zaiyi Chen,
Zhuoning Yuan,
**feng Yi,
Bowen Zhou,
Enhong Chen,
Tianbao Yang
Abstract:
Although stochastic gradient descent (SGD) method and its variants (e.g., stochastic momentum methods, AdaGrad) are the choice of algorithms for solving non-convex problems (especially deep learning), there still remain big gaps between the theory and the practice with many questions unresolved. For example, there is still a lack of theories of convergence for SGD and its variants that use stagewi…
▽ More
Although stochastic gradient descent (SGD) method and its variants (e.g., stochastic momentum methods, AdaGrad) are the choice of algorithms for solving non-convex problems (especially deep learning), there still remain big gaps between the theory and the practice with many questions unresolved. For example, there is still a lack of theories of convergence for SGD and its variants that use stagewise step size and return an averaged solution in practice. In addition, theoretical insights of why adaptive step size of AdaGrad could improve non-adaptive step size of {\sgd} is still missing for non-convex optimization. This paper aims to address these questions and fill the gap between theory and practice. We propose a universal stagewise optimization framework for a broad family of {\bf non-smooth non-convex} (namely weakly convex) problems with the following key features: (i) at each stage any suitable stochastic convex optimization algorithms (e.g., SGD or AdaGrad) that return an averaged solution can be employed for minimizing a regularized convex problem; (ii) the step size is decreased in a stagewise manner; (iii) an averaged solution is returned as the final solution that is selected from all stagewise averaged solutions with sampling probabilities {\it increasing} as the stage number. Our theoretical results of stagewise AdaGrad exhibit its adaptive convergence, therefore shed insights on its faster convergence for problems with sparse stochastic gradients than stagewise SGD. To the best of our knowledge, these new results are the first of their kind for addressing the unresolved issues of existing theories mentioned earlier. Besides theoretical contributions, our empirical studies show that our stagewise SGD and ADAGRAD improve the generalization performance of existing variants/implementations of SGD and ADAGRAD.
△ Less
Submitted 5 March, 2019; v1 submitted 19 August, 2018;
originally announced August 2018.
-
Enhancing Network Embedding with Auxiliary Information: An Explicit Matrix Factorization Perspective
Authors:
Junliang Guo,
Linli Xu,
Xunpeng Huang,
Enhong Chen
Abstract:
Recent advances in the field of network embedding have shown the low-dimensional network representation is playing a critical role in network analysis. However, most of the existing principles of network embedding do not incorporate auxiliary information such as content and labels of nodes flexibly. In this paper, we take a matrix factorization perspective of network embedding, and incorporate str…
▽ More
Recent advances in the field of network embedding have shown the low-dimensional network representation is playing a critical role in network analysis. However, most of the existing principles of network embedding do not incorporate auxiliary information such as content and labels of nodes flexibly. In this paper, we take a matrix factorization perspective of network embedding, and incorporate structure, content and label information of the network simultaneously. For structure, we validate that the matrix we construct preserves high-order proximities of the network. Label information can be further integrated into the matrix via the process of random walk sampling to enhance the quality of embedding in an unsupervised manner, i.e., without leveraging downstream classifiers. In addition, we generalize the Skip-Gram Negative Sampling model to integrate the content of the network in a matrix factorization framework. As a consequence, network embedding can be learned in a unified framework integrating network structure and node content as well as label information simultaneously. We demonstrate the efficacy of the proposed model with the tasks of semi-supervised node classification and link prediction on a variety of real-world benchmark network datasets.
△ Less
Submitted 4 March, 2018; v1 submitted 11 November, 2017;
originally announced November 2017.
-
Multivariate Spatial-temporal Prediction on Latent Low-dimensional Functional Structure with Non-stationarity
Authors:
Elynn Yi Chen,
Qiwei Yao,
Rong Chen
Abstract:
Multivariate spatio-temporal data arise more and more frequently in a wide range of applications; however, there are relatively few general statistical methods that can readily use that incorporate spatial, temporal and variable dependencies simultaneously. In this paper, we propose a new approach to represent non-parametrically the linear dependence structure of a multivariate spatio-temporal pro…
▽ More
Multivariate spatio-temporal data arise more and more frequently in a wide range of applications; however, there are relatively few general statistical methods that can readily use that incorporate spatial, temporal and variable dependencies simultaneously. In this paper, we propose a new approach to represent non-parametrically the linear dependence structure of a multivariate spatio-temporal process in terms of latent common factors. The matrix structure of observations from the multivariate spatio-temporal process is well reserved through the matrix factor model configuration. The spatial loading functions are estimated non-parametrically by sieve approximation and the variable loading matrix is estimated via an eigen-analysis of a symmetric non-negative definite matrix. Though factor decomposition along the space mode is similar to the low-rank approximation methods in spatial statistics, the fundamental difference is that the low-dimensional structure is completely unknown in our setting. Additionally, our method accommodates non-stationarity over space. The estimated loading functions facilitate spatial prediction. For temporal forecasting, we preserve the matrix structure of observations at each time point by utilizing the matrix autoregressive model of order one MAR(1). Asymptotic properties of the proposed methods are established. Performance of the proposed method is investigated on both synthetic and real datasets
△ Less
Submitted 11 November, 2017; v1 submitted 17 October, 2017;
originally announced October 2017.
-
Factor Models for High-Dimensional Dynamic Networks: with Application to International Trade Flow Time Series 1981-2015
Authors:
Elynn Yi Chen,
Rong Chen
Abstract:
Dynamic network analysis has found an increasing interest in the literature because of the importance of different kinds of dynamic social networks, biological networks, and economic networks. Most available probability and statistical models for dynamic network data are deduced from random graph theory where the networks are characterized on the node and edge level. They are often very restrictiv…
▽ More
Dynamic network analysis has found an increasing interest in the literature because of the importance of different kinds of dynamic social networks, biological networks, and economic networks. Most available probability and statistical models for dynamic network data are deduced from random graph theory where the networks are characterized on the node and edge level. They are often very restrictive for applications and unscalable to high-dimensional dynamic network data which is very common nowadays. In this paper, we take a different perspective: The evolving sequence of networks are treated as a time series of network matrices. We adopt a matrix factor model where the observed surface dynamic network is assumed to be driven by a latent dynamic network with lower dimensions. The linear relationship between the surface network and the latent network is characterized by unknown but deterministic loading matrices. The latent network and the corresponding loadings are estimated via an eigenanalysis of a positive definite matrix constructed from the auto-cross-covariances of the network times series, thus capturing the dynamics presenting in the network. The proposed method is able to unveil the latent dynamic structure and achieve the objective of dimension reduction. Different from other dynamic network analytical methods that build on latent variables, our approach imposes neither any distributional assumptions on the underlying network nor any parametric forms on its covariance function. The latent network is learned directly from the data with little subjective input. We applied the proposed method to the monthly international trade flow data from 1981 to 2015. The results unveil an interesting evolution of the latent trading network and the relations between the latent entities and the countries.
△ Less
Submitted 17 October, 2017;
originally announced October 2017.
-
Constrained Factor Models for High-Dimensional Matrix-Variate Time Series
Authors:
Elynn Y. Chen,
Ruey S. Tsay,
Rong Chen
Abstract:
High-dimensional matrix-variate time series data are becoming widely available in many scientific fields, such as economics, biology, and meteorology. To achieve significant dimension reduction while preserving the intrinsic matrix structure and temporal dynamics in such data, Wang et al. (2017) proposed a matrix factor model that is shown to provide effective analysis. In this paper, we establish…
▽ More
High-dimensional matrix-variate time series data are becoming widely available in many scientific fields, such as economics, biology, and meteorology. To achieve significant dimension reduction while preserving the intrinsic matrix structure and temporal dynamics in such data, Wang et al. (2017) proposed a matrix factor model that is shown to provide effective analysis. In this paper, we establish a general framework for incorporating domain or prior knowledge in the matrix factor model through linear constraints. The proposed framework is shown to be useful in achieving parsimonious parameterization, facilitating interpretation of the latent matrix factor, and identifying specific factors of interest. Fully utilizing the prior-knowledge-induced constraints results in more efficient and accurate modeling, inference, dimension reduction as well as a clear and better interpretation of the results. In this paper, constrained, multi-term, and partially constrained factor models for matrix-variate time series are developed, with efficient estimation procedures and their asymptotic properties. We show that the convergence rates of the constrained factor loading matrices are much faster than those of the conventional matrix factor analysis under many situations. Simulation studies are carried out to demonstrate the finite-sample performance of the proposed method and its associated asymptotic properties. We illustrate the proposed model with three applications, where the constrained matrix-factor models outperform their unconstrained counterparts in the power of variance explanation under the out-of-sample 10-fold cross-validation setting.
△ Less
Submitted 19 October, 2022; v1 submitted 16 October, 2017;
originally announced October 2017.
-
YouTube-8M Video Understanding Challenge Approach and Applications
Authors:
Edward Chen
Abstract:
This paper introduces the YouTube-8M Video Understanding Challenge hosted as a Kaggle competition and also describes my approach to experimenting with various models. For each of my experiments, I provide the score result as well as possible improvements to be made. Towards the end of the paper, I discuss the various ensemble learning techniques that I applied on the dataset which significantly bo…
▽ More
This paper introduces the YouTube-8M Video Understanding Challenge hosted as a Kaggle competition and also describes my approach to experimenting with various models. For each of my experiments, I provide the score result as well as possible improvements to be made. Towards the end of the paper, I discuss the various ensemble learning techniques that I applied on the dataset which significantly boosted my overall competition score. At last, I discuss the exciting future of video understanding research and also the many applications that such research could significantly improve.
△ Less
Submitted 26 June, 2017;
originally announced June 2017.
-
Characteristic Direction Approach to Identify Differentially Expressed Genes
Authors:
Neil R. Clark,
Kevin Hu,
Edward Y. Chen,
Qioanan Duan,
Avi Ma`ayan
Abstract:
Genome-wide gene expression profiles, as measured with microarrays or RNA-Seq experiments, have revolutionized biological and biomedical research by providing a quantitative measure of the entire mRNA transcriptome. Typically, researchers set up experiments where control samples are compared to a treatment condition, and using the t-test they identify differentially expressed genes upon which furt…
▽ More
Genome-wide gene expression profiles, as measured with microarrays or RNA-Seq experiments, have revolutionized biological and biomedical research by providing a quantitative measure of the entire mRNA transcriptome. Typically, researchers set up experiments where control samples are compared to a treatment condition, and using the t-test they identify differentially expressed genes upon which further analysis and ultimately biological discovery from such experiments is based. Here we describe an alternative geometrical approach to identify differentially expressed genes. We show that this alternative method, called the Characteristic Direction, is capable of identifying more relevant genes. We evaluate our approach in three case studies. In the first two, we match transcription factor targets determined by ChIP-seq profiling with differentially expressed genes after the same transcription factor knockdown or over-expression in mammalian cells. In the third case study, we evaluate the quality of enriched terms when comparing normal epithelial cells with cancer stem cells. In conclusion, we demonstrate that the Characteristic Direction approach is much better in calling the significantly differentially expressed genes and should replace the widely currently in used t-test method for this purpose. Implementations of the method in MATLAB, Python and Mathematica are available at: http://www.maayanlab.net/CD.
△ Less
Submitted 31 July, 2013;
originally announced July 2013.