Search | arXiv e-print repository

arXiv:2405.01986 [pdf]

A comparison of regression models for static and dynamic prediction of a prognostic outcome during admission in electronic health care records

Authors: Shan Gao, Elena Albu, Hein Putter, Pieter Stijnen, Frank Rademakers, Veerle Cossey, Yves Debaveye, Christel Janssens, Ben Van Calster, Laure Wynants

Abstract: Objective Hospitals register information in the electronic health records (EHR) continuously until discharge or death. As such, there is no censoring for in-hospital outcomes. We aimed to compare different dynamic regression modeling approaches to predict central line-associated bloodstream infections (CLABSI) in EHR while accounting for competing events precluding CLABSI. Materials and Methods We… ▽ More Objective Hospitals register information in the electronic health records (EHR) continuously until discharge or death. As such, there is no censoring for in-hospital outcomes. We aimed to compare different dynamic regression modeling approaches to predict central line-associated bloodstream infections (CLABSI) in EHR while accounting for competing events precluding CLABSI. Materials and Methods We analyzed data from 30,862 catheter episodes at University Hospitals Leuven from 2012 and 2013 to predict 7-day risk of CLABSI. Competing events are discharge and death. Static models at catheter onset included logistic, multinomial logistic, Cox, cause-specific hazard, and Fine-Gray regression. Dynamic models updated predictions daily up to 30 days after catheter onset (i.e. landmarks 0 to 30 days), and included landmark supermodel extensions of the static models, separate Fine-Gray models per landmark time, and regularized multi-task learning (RMTL). Model performance was assessed using 100 random 2:1 train-test splits. Results The Cox model performed worst of all static models in terms of area under the receiver operating characteristic curve (AUC) and calibration. Dynamic landmark supermodels reached peak AUCs between 0.741-0.747 at landmark 5. The Cox landmark supermodel had the worst AUCs (<=0.731) and calibration up to landmark 7. Separate Fine-Gray models per landmark performed worst for later landmarks, when the number of patients at risk was low. Discussion and Conclusion Categorical and time-to-event approaches had similar performance in the static and dynamic settings, except Cox models. Ignoring competing risks caused problems for risk prediction in the time-to-event framework (Cox), but not in the categorical framework (logistic regression). △ Less

Submitted 6 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

Comments: 3388 words; 3 figures; 4 tables

arXiv:2404.16127 [pdf, other]

Comparison of static and dynamic random forests models for EHR data in the presence of competing risks: predicting central line-associated bloodstream infection

Authors: Elena Albu, Shan Gao, Pieter Stijnen, Frank Rademakers, Christel Janssens, Veerle Cossey, Yves Debaveye, Laure Wynants, Ben Van Calster

Abstract: Prognostic outcomes related to hospital admissions typically do not suffer from censoring, and can be modeled either categorically or as time-to-event. Competing events are common but often ignored. We compared the performance of random forest (RF) models to predict the risk of central line-associated bloodstream infections (CLABSI) using different outcome operationalizations. We included data fro… ▽ More Prognostic outcomes related to hospital admissions typically do not suffer from censoring, and can be modeled either categorically or as time-to-event. Competing events are common but often ignored. We compared the performance of random forest (RF) models to predict the risk of central line-associated bloodstream infections (CLABSI) using different outcome operationalizations. We included data from 27478 admissions to the University Hospitals Leuven, covering 30862 catheter episodes (970 CLABSI, 1466 deaths and 28426 discharges) to build static and dynamic RF models for binary (CLABSI vs no CLABSI), multinomial (CLABSI, discharge, death or no event), survival (time to CLABSI) and competing risks (time to CLABSI, discharge or death) outcomes to predict the 7-day CLABSI risk. We evaluated model performance across 100 train/test splits. Performance of binary, multinomial and competing risks models was similar: AUROC was 0.74 for baseline predictions, rose to 0.78 for predictions at day 5 in the catheter episode, and decreased thereafter. Survival models overestimated the risk of CLABSI (E:O ratios between 1.2 and 1.6), and had AUROCs about 0.01 lower than other models. Binary and multinomial models had lowest computation times. Models including multiple outcome events (multinomial and competing risks) display a different internal structure compared to binary and survival models. In the absence of censoring, complex modelling choices do not considerably improve the predictive performance compared to a binary model for CLABSI prediction in our studied settings. Survival models censoring the competing events at their time of occurrence should be avoided. △ Less

Submitted 24 May, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.08893 [pdf, other]

Early detection of disease outbreaks and non-outbreaks using incidence data

Authors: Shan Gao, Amit K. Chakraborty, Russell Greiner, Mark A. Lewis, Hao Wang

Abstract: Forecasting the occurrence and absence of novel disease outbreaks is essential for disease management. Here, we develop a general model, with no real-world training data, that accurately forecasts outbreaks and non-outbreaks. We propose a novel framework, using a feature-based time series classification method to forecast outbreaks and non-outbreaks. We tested our methods on synthetic data from a… ▽ More Forecasting the occurrence and absence of novel disease outbreaks is essential for disease management. Here, we develop a general model, with no real-world training data, that accurately forecasts outbreaks and non-outbreaks. We propose a novel framework, using a feature-based time series classification method to forecast outbreaks and non-outbreaks. We tested our methods on synthetic data from a Susceptible-Infected-Recovered model for slowly changing, noisy disease dynamics. Outbreak sequences give a transcritical bifurcation within a specified future time window, whereas non-outbreak (null bifurcation) sequences do not. We identified incipient differences in time series of infectives leading to future outbreaks and non-outbreaks. These differences are reflected in 22 statistical features and 5 early warning signal indicators. Classifier performance, given by the area under the receiver-operating curve, ranged from 0.99 for large expanding windows of training data to 0.7 for small rolling windows. Real-world performances of classifiers were tested on two empirical datasets, COVID-19 data from Singapore and SARS data from Hong Kong, with two classifiers exhibiting high accuracy. In summary, we showed that there are statistical features that distinguish outbreak and non-outbreak sequences long before outbreaks occur. We could detect these differences in synthetic and real-world data sets, well before potential outbreaks occur. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2403.16233 [pdf, other]

An early warning indicator trained on stochastic disease-spreading models with different noises

Authors: Amit K. Chakraborty, Shan Gao, Reza Miry, Pouria Ramazi, Russell Greiner, Mark A. Lewis, Hao Wang

Abstract: The timely detection of disease outbreaks through reliable early warning signals (EWSs) is indispensable for effective public health mitigation strategies. Nevertheless, the intricate dynamics of real-world disease spread, often influenced by diverse sources of noise and limited data in the early stages of outbreaks, pose a significant challenge in develo** reliable EWSs, as the performance of e… ▽ More The timely detection of disease outbreaks through reliable early warning signals (EWSs) is indispensable for effective public health mitigation strategies. Nevertheless, the intricate dynamics of real-world disease spread, often influenced by diverse sources of noise and limited data in the early stages of outbreaks, pose a significant challenge in develo** reliable EWSs, as the performance of existing indicators varies with extrinsic and intrinsic noises. Here, we address the challenge of modeling disease when the measurements are corrupted by additive white noise, multiplicative environmental noise, and demographic noise into a standard epidemic mathematical model. To navigate the complexities introduced by these noise sources, we employ a deep learning algorithm that provides EWS in infectious disease outbreak by training on noise-induced disease-spreading models. The indicator's effectiveness is demonstrated through its application to real-world COVID-19 cases in Edmonton and simulated time series derived from diverse disease spread models affected by noise. Notably, the indicator captures an impending transition in a time series of disease outbreaks and outperforms existing indicators. This study contributes to advancing early warning capabilities by addressing the intricate dynamics inherent in real-world disease spread, presenting a promising avenue for enhancing public health preparedness and response efforts. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2312.00032 [pdf, other]

An algorithm for forensic toolmark comparisons

Authors: Maria Cuellar, Sheng Gao, Heike Hofmann

Abstract: Forensic toolmark analysis traditionally relies on subjective human judgment, leading to inconsistencies and lack of transparency. The multitude of variables, including angles and directions of mark generation, further complicates comparisons. To address this, we first generate a dataset of 3D toolmarks from various angles and directions using consecutively manufactured slotted screwdrivers. By us… ▽ More Forensic toolmark analysis traditionally relies on subjective human judgment, leading to inconsistencies and lack of transparency. The multitude of variables, including angles and directions of mark generation, further complicates comparisons. To address this, we first generate a dataset of 3D toolmarks from various angles and directions using consecutively manufactured slotted screwdrivers. By using PAM clustering, we find that there is clustering by tool rather than angle or direction. Using Known Match and Known Non-Match densities, we establish thresholds for classification. Fitting Beta distributions to the densities, we allow for the derivation of likelihood ratios for new toolmark pairs. With a cross-validated sensitivity of 98% and specificity of 96%, our approach enhances the reliability of toolmark analysis. This approach is applicable to slotted screwdrivers, and for screwdrivers that are made with a similar production method. With data collection of other tools and factors, it could be applied to compare toolmarks of other types. This empirically trained, open-source solution offers forensic examiners a standardized means to objectively compare toolmarks, potentially decreasing the number of miscarriages of justice in the legal system. △ Less

Submitted 7 June, 2024; v1 submitted 19 November, 2023; originally announced December 2023.

Comments: Revised text, results unchanged

arXiv:2211.14723 [pdf, other]

Asymptotic Optimality of Myopic Ranking and Selection Procedures

Authors: Yanwen Li, Siyang Gao, Zhongshun Shi

Abstract: Ranking and selection (R&S) is a popular model for studying discrete-event dynamic systems. It aims to select the best design (the design with the largest mean performance) from a finite set, where the mean of each design is unknown and has to be learned by samples. Great research efforts have been devoted to this problem in the literature for develo** procedures with superior empirical performa… ▽ More Ranking and selection (R&S) is a popular model for studying discrete-event dynamic systems. It aims to select the best design (the design with the largest mean performance) from a finite set, where the mean of each design is unknown and has to be learned by samples. Great research efforts have been devoted to this problem in the literature for develo** procedures with superior empirical performance and showing their optimality. In these efforts, myopic procedures were popular. They select the best design using a 'naive' mechanism of iteratively and myopically improving an approximation of the objective measure. Although they are based on simple heuristics and lack theoretical support, they turned out highly effective, and often achieved competitive empirical performance compared to procedures that were proposed later and shown to be asymptotically optimal. In this paper, we theoretically analyze these myopic procedures and prove that they also satisfy the optimality conditions of R&S, just like some other popular R&S methods. It explains the good performance of myopic procedures in various numerical tests, and provides good insight into the structure and theoretical development of efficient R&S procedures. △ Less

Submitted 26 November, 2022; originally announced November 2022.

arXiv:2211.14722 [pdf, other]

Convergence Rate Analysis for Optimal Computing Budget Allocation Algorithms

Authors: Yanwen Li, Siyang Gao

Abstract: Ordinal optimization (OO) is a widely-studied technique for optimizing discrete-event dynamic systems (DEDS). It evaluates the performance of the system designs in a finite set by sampling and aims to correctly make ordinal comparison of the designs. A well-known method in OO is the optimal computing budget allocation (OCBA). It builds the optimality conditions for the number of samples allocated… ▽ More Ordinal optimization (OO) is a widely-studied technique for optimizing discrete-event dynamic systems (DEDS). It evaluates the performance of the system designs in a finite set by sampling and aims to correctly make ordinal comparison of the designs. A well-known method in OO is the optimal computing budget allocation (OCBA). It builds the optimality conditions for the number of samples allocated to each design, and the sample allocation that satisfies the optimality conditions is shown to asymptotically maximize the probability of correct selection for the best design. In this paper, we investigate two popular OCBA algorithms. With known variances for samples of each design, we characterize their convergence rates with respect to different performance measures. We first demonstrate that the two OCBA algorithms achieve the optimal convergence rate under measures of probability of correct selection and expected opportunity cost. It fills the void of convergence analysis for OCBA algorithms. Next, we extend our analysis to the measure of cumulative regret, a main measure studied in the field of machine learning. We show that with minor modification, the two OCBA algorithms can reach the optimal convergence rate under cumulative regret. It indicates the potential of broader use of algorithms designed based on the OCBA optimality conditions. △ Less

Submitted 28 November, 2022; v1 submitted 26 November, 2022; originally announced November 2022.

arXiv:2211.13685 [pdf, other]

Convergence Analysis of Stochastic Kriging-Assisted Simulation with Random Covariates

Authors: Cheng Li, Siyang Gao, Jianzhong Du

Abstract: We consider performing simulation experiments in the presence of covariates. Here, covariates refer to some input information other than system designs to the simulation model that can also affect the system performance. To make decisions, decision makers need to know the covariate values of the problem. Traditionally in simulation-based decision making, simulation samples are collected after the… ▽ More We consider performing simulation experiments in the presence of covariates. Here, covariates refer to some input information other than system designs to the simulation model that can also affect the system performance. To make decisions, decision makers need to know the covariate values of the problem. Traditionally in simulation-based decision making, simulation samples are collected after the covariate values are known; in contrast, as a new framework, simulation with covariates starts the simulation before the covariate values are revealed, and collects samples on covariate values that might appear later. Then, when the covariate values are revealed, the collected simulation samples are directly used to predict the desired results. This framework significantly reduces the decision time compared to the traditional way of simulation. In this paper, we follow this framework and suppose there are a finite number of system designs. We adopt the metamodel of stochastic kriging (SK) and use it to predict the system performance of each design and the best design. The goal is to study how fast the prediction errors diminish with the number of covariate points sampled. This is a fundamental problem in simulation with covariates and helps quantify the relationship between the offline simulation efforts and the online prediction accuracy. Particularly, we adopt measures of the maximal integrated mean squared error (IMSE) and integrated probability of false selection (IPFS) for assessing errors of the system performance and the best design predictions. Then, we establish convergence rates for the two measures under mild conditions. Last, these convergence behaviors are illustrated numerically using test examples. △ Less

Submitted 24 November, 2022; originally announced November 2022.

arXiv:2209.07070 [pdf, ps, other]

Fixed-Point Centrality for Networks

Authors: Shuang Gao

Abstract: This paper proposes a family of network centralities called fixed-point centralities. This centrality family is defined via the fixed point of permutation equivariant map**s related to the underlying network. Such a centrality notion is immediately extended to define fixed-point centralities for infinite graphs characterized by graphons. Variation bounds of such centralities with respect to the… ▽ More This paper proposes a family of network centralities called fixed-point centralities. This centrality family is defined via the fixed point of permutation equivariant map**s related to the underlying network. Such a centrality notion is immediately extended to define fixed-point centralities for infinite graphs characterized by graphons. Variation bounds of such centralities with respect to the variations of the underlying graphs and graphons under mild assumptions are established. Fixed-point centralities connect with a variety of different models on networks including graph neural networks, static and dynamic games on networks, and Markov decision processes. △ Less

Submitted 15 September, 2022; originally announced September 2022.

Comments: 8 pages, Accepted for presentation at IEEE Conference on Decision and Control

arXiv:2206.06847 [pdf, other]

On the Finite-Time Performance of the Knowledge Gradient Algorithm

Authors: Yanwen Li, Siyang Gao

Abstract: The knowledge gradient (KG) algorithm is a popular and effective algorithm for the best arm identification (BAI) problem. Due to the complex calculation of KG, theoretical analysis of this algorithm is difficult, and existing results are mostly about the asymptotic performance of it, e.g., consistency, asymptotic sample allocation, etc. In this research, we present new theoretical results about th… ▽ More The knowledge gradient (KG) algorithm is a popular and effective algorithm for the best arm identification (BAI) problem. Due to the complex calculation of KG, theoretical analysis of this algorithm is difficult, and existing results are mostly about the asymptotic performance of it, e.g., consistency, asymptotic sample allocation, etc. In this research, we present new theoretical results about the finite-time performance of the KG algorithm. Under independent and normally distributed rewards, we derive bounds for the sample allocation of the algorithm. With these bounds, existing asymptotic results become simple corollaries. Furthermore, we derive upper and lower bounds for the probability of error and simple regret of the algorithm, and show the performance of the algorithm for the multi-armed bandit (MAB) problem. These developments not only extend the existing analysis of the KG algorithm, but can also be used to analyze other improvement-based algorithms. Last, we use numerical experiments to compare the bounds we derive and the performance of the KG algorithm. △ Less

Submitted 4 August, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

arXiv:2206.04236 [pdf, other]

Analytical Composition of Differential Privacy via the Edgeworth Accountant

Authors: Hua Wang, Sheng Gao, Huanyu Zhang, Milan Shen, Weijie J. Su

Abstract: Many modern machine learning algorithms are composed of simple private algorithms; thus, an increasingly important problem is to efficiently compute the overall privacy loss under composition. In this study, we introduce the Edgeworth Accountant, an analytical approach to composing differential privacy guarantees of private algorithms. The Edgeworth Accountant starts by losslessly tracking the pri… ▽ More Many modern machine learning algorithms are composed of simple private algorithms; thus, an increasingly important problem is to efficiently compute the overall privacy loss under composition. In this study, we introduce the Edgeworth Accountant, an analytical approach to composing differential privacy guarantees of private algorithms. The Edgeworth Accountant starts by losslessly tracking the privacy loss under composition using the $f$-differential privacy framework, which allows us to express the privacy guarantees using privacy-loss log-likelihood ratios (PLLRs). As the name suggests, this accountant next uses the Edgeworth expansion to the upper and lower bounds the probability distribution of the sum of the PLLRs. Moreover, by relying on a technique for approximating complex distributions using simple ones, we demonstrate that the Edgeworth Accountant can be applied to the composition of any noise-addition mechanism. Owing to certain appealing features of the Edgeworth expansion, the $(ε, δ)$-differential privacy bounds offered by this accountant are non-asymptotic, with essentially no extra computational cost, as opposed to the prior approaches in, wherein the running times increase with the number of compositions. Finally, we demonstrate that our upper and lower $(ε, δ)$-differential privacy bounds are tight in federated analytics and certain regimes of training private deep learning models. △ Less

Submitted 8 June, 2022; originally announced June 2022.

arXiv:2203.09611 [pdf, other]

doi 10.1080/13658816.2022.2053980

STICC: A multivariate spatial clustering method for repeated geographic pattern discovery with consideration of spatial contiguity

Authors: Yuhao Kang, Kunlin Wu, Song Gao, Ignavier Ng, **meng Rao, Shan Ye, Fan Zhang, Teng Fei

Abstract: Spatial clustering has been widely used for spatial data mining and knowledge discovery. An ideal multivariate spatial clustering should consider both spatial contiguity and aspatial attributes. Existing spatial clustering approaches may face challenges for discovering repeated geographic patterns with spatial contiguity maintained. In this paper, we propose a Spatial Toeplitz Inverse Covariance-B… ▽ More Spatial clustering has been widely used for spatial data mining and knowledge discovery. An ideal multivariate spatial clustering should consider both spatial contiguity and aspatial attributes. Existing spatial clustering approaches may face challenges for discovering repeated geographic patterns with spatial contiguity maintained. In this paper, we propose a Spatial Toeplitz Inverse Covariance-Based Clustering (STICC) method that considers both attributes and spatial relationships of geographic objects for multivariate spatial clustering. A subregion is created for each geographic object serving as the basic unit when performing clustering. A Markov random field is then constructed to characterize the attribute dependencies of subregions. Using a spatial consistency strategy, nearby objects are encouraged to belong to the same cluster. To test the performance of the proposed STICC algorithm, we apply it in two use cases. The comparison results with several baseline methods show that the STICC outperforms others significantly in terms of adjusted rand index and macro-F1 score. Join count statistics is also calculated and shows that the spatial contiguity is well preserved by STICC. Such a spatial clustering method may benefit various applications in the fields of geography, remote sensing, transportation, and urban planning, etc. △ Less

Submitted 30 March, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

Journal ref: International Journal of Geographical Information Science, Year 2022

arXiv:2201.02926 [pdf]

Variational design for a structural family of CAD models

Authors: Qiang Zou, Qiqiang Zheng, Zhihong Tang, Shuming Gao

Abstract: Variational design is a well-recognized CAD technique due to the increased design efficiency. It often presents as a parametric family of CAD models. Although effective, this way of working cannot handle design requirements that go beyond parametric changes. Such design requirements are not uncommon today due to the increasing popularity of product customization. In particular, there is often a ne… ▽ More Variational design is a well-recognized CAD technique due to the increased design efficiency. It often presents as a parametric family of CAD models. Although effective, this way of working cannot handle design requirements that go beyond parametric changes. Such design requirements are not uncommon today due to the increasing popularity of product customization. In particular, there is often a need for designing a new model out of an existing structural family of models, which share a structural pattern but have individually varied detail features. To facilitate such design requirements, a new method is presented in this paper. The idea is to express the underlying structural pattern in terms of a submodel composed of the maximum common design features of the family, and then to build a single master model by attaching to the submodel all detail design features in the family. This master model is a representative model for the family and contains all the features. By removing unwanted detail features and adding new features, the master model can be easily adapted into a new design, while kee** aligned with the family, structurally. Effectiveness of this method has been validated by a series of case studies and comparisons of increasing complexity. △ Less

Submitted 8 January, 2022; originally announced January 2022.

Comments: 12 pages, 11 figures, journal paper

ACM Class: I.3.5

arXiv:2107.01152 [pdf, other]

Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE

Authors: Junya Chen, Zhe Gan, Xuan Li, Qing Guo, Liqun Chen, Shuyang Gao, Tagyoung Chung, Yi Xu, Belinda Zeng, Wenlian Lu, Fan Li, Lawrence Carin, Chenyang Tao

Abstract: InfoNCE-based contrastive representation learners, such as SimCLR, have been tremendously successful in recent years. However, these contrastive schemes are notoriously resource demanding, as their effectiveness breaks down with small-batch training (i.e., the log-K curse, whereas K is the batch-size). In this work, we reveal mathematically why contrastive learners fail in the small-batch-size reg… ▽ More InfoNCE-based contrastive representation learners, such as SimCLR, have been tremendously successful in recent years. However, these contrastive schemes are notoriously resource demanding, as their effectiveness breaks down with small-batch training (i.e., the log-K curse, whereas K is the batch-size). In this work, we reveal mathematically why contrastive learners fail in the small-batch-size regime, and present a novel simple, non-trivial contrastive objective named FlatNCE, which fixes this issue. Unlike InfoNCE, our FlatNCE no longer explicitly appeals to a discriminative classification goal for contrastive learning. Theoretically, we show FlatNCE is the mathematical dual formulation of InfoNCE, thus bridging the classical literature on energy modeling; and empirically, we demonstrate that, with minimal modification of code, FlatNCE enables immediate performance boost independent of the subject-matter engineering efforts. The significance of this work is furthered by the powerful generalization of contrastive learning techniques, and the introduction of new tools to monitor and diagnose contrastive training. We substantiate our claims with empirical evidence on CIFAR10, ImageNet, and other datasets, where FlatNCE consistently outperforms InfoNCE. △ Less

Submitted 2 July, 2021; originally announced July 2021.

arXiv:2107.00371 [pdf, other]

Sparse GCA and Thresholded Gradient Descent

Authors: Sheng Gao, Zongming Ma

Abstract: Generalized correlation analysis (GCA) is concerned with uncovering linear relationships across multiple datasets. It generalizes canonical correlation analysis that is designed for two datasets. We study sparse GCA when there are potentially multiple generalized correlation tuples in data and the loading matrix has a small number of nonzero rows. It includes sparse CCA and sparse PCA of correlati… ▽ More Generalized correlation analysis (GCA) is concerned with uncovering linear relationships across multiple datasets. It generalizes canonical correlation analysis that is designed for two datasets. We study sparse GCA when there are potentially multiple generalized correlation tuples in data and the loading matrix has a small number of nonzero rows. It includes sparse CCA and sparse PCA of correlation matrices as special cases. We first formulate sparse GCA as generalized eigenvalue problems at both population and sample levels via a careful choice of normalization constraints. Based on a Lagrangian form of the sample optimization problem, we propose a thresholded gradient descent algorithm for estimating GCA loading vectors and matrices in high dimensions. We derive tight estimation error bounds for estimators generated by the algorithm with proper initialization. We also demonstrate the prowess of the algorithm on a number of synthetic datasets. △ Less

Submitted 6 February, 2023; v1 submitted 1 July, 2021; originally announced July 2021.

arXiv:2007.06680 [pdf, other]

Momentum-Based Policy Gradient Methods

Authors: Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

Abstract: In the paper, we propose a class of efficient momentum-based policy gradient methods for the model-free reinforcement learning, which use adaptive learning rates and do not require any large batches. Specifically, we propose a fast important-sampling momentum-based policy gradient (IS-MBPG) method based on a new momentum-based variance reduced technique and the importance sampling technique. We al… ▽ More In the paper, we propose a class of efficient momentum-based policy gradient methods for the model-free reinforcement learning, which use adaptive learning rates and do not require any large batches. Specifically, we propose a fast important-sampling momentum-based policy gradient (IS-MBPG) method based on a new momentum-based variance reduced technique and the importance sampling technique. We also propose a fast Hessian-aided momentum-based policy gradient (HA-MBPG) method based on the momentum-based variance reduced technique and the Hessian-aided technique. Moreover, we prove that both the IS-MBPG and HA-MBPG methods reach the best known sample complexity of $O(ε^{-3})$ for finding an $ε$-stationary point of the non-concave performance function, which only require one trajectory at each iteration. In particular, we present a non-adaptive version of IS-MBPG method, i.e., IS-MBPG*, which also reaches the best known sample complexity of $O(ε^{-3})$ without any large batches. In the experiments, we apply four benchmark tasks to demonstrate the effectiveness of our algorithms. △ Less

Submitted 6 August, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

Comments: ICML 2020, 24 pages

arXiv:2006.08051 [pdf, other]

Provably Efficient Model-based Policy Adaptation

Authors: Yuda Song, Aditi Mavalankar, Wen Sun, Sicun Gao

Abstract: The high sample complexity of reinforcement learning challenges its use in practice. A promising approach is to quickly adapt pre-trained policies to new environments. Existing methods for this policy adaptation problem typically rely on domain randomization and meta-learning, by sampling from some distribution of target environments during pre-training, and thus face difficulty on out-of-distribu… ▽ More The high sample complexity of reinforcement learning challenges its use in practice. A promising approach is to quickly adapt pre-trained policies to new environments. Existing methods for this policy adaptation problem typically rely on domain randomization and meta-learning, by sampling from some distribution of target environments during pre-training, and thus face difficulty on out-of-distribution target environments. We propose new model-based mechanisms that are able to make online adaptation in unseen target environments, by combining ideas from no-regret online learning and adaptive control. We prove that the approach learns policies in the target environment that can quickly recover trajectories from the source environment, and establish the rate of convergence in general settings. We demonstrate the benefits of our approach for policy adaptation in a diverse set of continuous control tasks, achieving the performance of state-of-the-art methods with much lower sample complexity. △ Less

Submitted 14 June, 2020; originally announced June 2020.

arXiv:2005.07567 [pdf]

Accelerating drug repurposing for COVID-19 via modeling drug mechanism of action with large scale gene-expression profiles

Authors: Lu Han, G. C. Shan, B. F. Chu, H. Y. Wang, Z. J. Wang, S. Q. Gao, W. X. Zhou

Abstract: The novel coronavirus disease, named COVID-19, emerged in China in December 2019, and has rapidly spread around the world. It is clearly urgent to fight COVID-19 at global scale. The development of methods for identifying drug uses based on phenotypic data can improve the efficiency of drug development. However, there are still many difficulties in identifying drug applications based on cell pictu… ▽ More The novel coronavirus disease, named COVID-19, emerged in China in December 2019, and has rapidly spread around the world. It is clearly urgent to fight COVID-19 at global scale. The development of methods for identifying drug uses based on phenotypic data can improve the efficiency of drug development. However, there are still many difficulties in identifying drug applications based on cell picture data. This work reported one state-of-the-art machine learning method to identify drug uses based on the cell image features of 1024 drugs generated in the LINCS program. Because the multi-dimensional features of the image are affected by non-experimental factors, the characteristics of similar drugs vary greatly, and the current sample number is not enough to use deep learning and other methods are used for learning optimization. As a consequence, this study is based on the supervised ITML algorithm to convert the characteristics of drugs. The results show that the characteristics of ITML conversion are more conducive to the recognition of drug functions. The analysis of feature conversion shows that different features play important roles in identifying different drug functions. For the current COVID-19, Chloroquine and Hydroxychloroquine achieve antiviral effects by inhibiting endocytosis, etc., and were classified to the same community. And Clomiphene in the same community inibited the entry of Ebola Virus, indicated a similar MoAs that could be reflected by cell image. △ Less

Submitted 5 October, 2021; v1 submitted 15 May, 2020; originally announced May 2020.

Comments: 22 pages, 4 figures. Cognitive Neurodynamics (2021)

arXiv:2005.05783 [pdf]

Modeling Route Choice with Real-Time Information: Comparing the Recursive and Non-Recursive Models

Authors: Xinlian Yu, Tien Mai, **g Ding-Mastera, Song Gao, Emma Fre**ger

Abstract: We study the routing policy choice problems in a stochastic time-dependent (STD) network. A routing policy is defined as a decision rule applied at the end of each link that maps the realized traffic condition to the decision on the link to take next. Two types of routing policy choice models are formulated with perfect online information (POI): recursive logit model and non-recursive logit model.… ▽ More We study the routing policy choice problems in a stochastic time-dependent (STD) network. A routing policy is defined as a decision rule applied at the end of each link that maps the realized traffic condition to the decision on the link to take next. Two types of routing policy choice models are formulated with perfect online information (POI): recursive logit model and non-recursive logit model. In the non-recursive model, a choice set of routing policies between an origin-destination (OD) pair is generated, and a probabilistic choice is modeled at the origin, while the choice of the next link at each link is a deterministic execution of the chosen routing policy. In the recursive model, the probabilistic choice of the next link is modeled at each link, following the framework of dynamic discrete choice models. The two models are further compared in terms of computational efficiency in estimation and prediction, and flexibility in systematic utility specification and modeling correlation. △ Less

Submitted 4 June, 2020; v1 submitted 8 May, 2020; originally announced May 2020.

arXiv:2005.00611 [pdf, other]

Neural Lyapunov Control

Authors: Ya-Chien Chang, Nima Roohi, Sicun Gao

Abstract: We propose new methods for learning control policies and neural network Lyapunov functions for nonlinear control problems, with provable guarantee of stability. The framework consists of a learner that attempts to find the control and Lyapunov functions, and a falsifier that finds counterexamples to quickly guide the learner towards solutions. The procedure terminates when no counterexample is fou… ▽ More We propose new methods for learning control policies and neural network Lyapunov functions for nonlinear control problems, with provable guarantee of stability. The framework consists of a learner that attempts to find the control and Lyapunov functions, and a falsifier that finds counterexamples to quickly guide the learner towards solutions. The procedure terminates when no counterexample is found by the falsifier, in which case the controlled nonlinear system is provably stable. The approach significantly simplifies the process of Lyapunov control design, provides end-to-end correctness guarantee, and can obtain much larger regions of attraction than existing methods such as LQR and SOS/SDP. We show experiments on how the new methods obtain high-quality solutions for challenging control problems. △ Less

Submitted 22 September, 2022; v1 submitted 1 May, 2020; originally announced May 2020.

Comments: NeurIPS 2019

arXiv:1910.03487 [pdf, other]

Controlled Text Generation for Data Augmentation in Intelligent Artificial Agents

Authors: Nikolaos Malandrakis, Minmin Shen, Anuj Goyal, Shuyang Gao, Abhishek Sethi, Angeliki Metallinou

Abstract: Data availability is a bottleneck during early stages of development of new capabilities for intelligent artificial agents. We investigate the use of text generation techniques to augment the training data of a popular commercial artificial agent across categories of functionality, with the goal of faster development of new functionality. We explore a variety of encoder-decoder generative models f… ▽ More Data availability is a bottleneck during early stages of development of new capabilities for intelligent artificial agents. We investigate the use of text generation techniques to augment the training data of a popular commercial artificial agent across categories of functionality, with the goal of faster development of new functionality. We explore a variety of encoder-decoder generative models for synthetic training data generation and propose using conditional variational auto-encoders. Our approach requires only direct optimization, works well with limited data and significantly outperforms the previous controlled text generation techniques. Further, the generated data are used as additional training samples in an extrinsic intent classification task, leading to improved performance by up to 5\% absolute f-score in low-resource cases, validating the usefulness of our approach. △ Less

Submitted 4 October, 2019; originally announced October 2019.

Comments: EMNLP WNGT workshop

arXiv:1907.13463 [pdf, other]

Nonconvex Zeroth-Order Stochastic ADMM Methods with Lower Function Query Complexity

Authors: Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

Abstract: Zeroth-order (a.k.a, derivative-free) methods are a class of effective optimization methods for solving complex machine learning problems, where gradients of the objective functions are not available or computationally prohibitive. Recently, although many zeroth-order methods have been developed, these approaches still have two main drawbacks: 1) high function query complexity; 2) not being well s… ▽ More Zeroth-order (a.k.a, derivative-free) methods are a class of effective optimization methods for solving complex machine learning problems, where gradients of the objective functions are not available or computationally prohibitive. Recently, although many zeroth-order methods have been developed, these approaches still have two main drawbacks: 1) high function query complexity; 2) not being well suitable for solving the problems with complex penalties and constraints. To address these challenging drawbacks, in this paper, we propose a class of faster zeroth-order stochastic alternating direction method of multipliers (ADMM) methods (ZO-SPIDER-ADMM) to solve the nonconvex finite-sum problems with multiple nonsmooth penalties. Moreover, we prove that the ZO-SPIDER-ADMM methods can achieve a lower function query complexity of $O(nd+dn^{\frac{1}{2}}ε^{-1})$ for finding an $ε$-stationary point, which improves the existing best nonconvex zeroth-order ADMM methods by a factor of $O(d^{\frac{1}{3}}n^{\frac{1}{6}})$, where $n$ and $d$ denote the sample size and data dimension, respectively. At the same time, we propose a class of faster zeroth-order online ADMM methods (ZOO-ADMM+) to solve the nonconvex online problems with multiple nonsmooth penalties. We also prove that the proposed ZOO-ADMM+ methods achieve a lower function query complexity of $O(dε^{-\frac{3}{2}})$, which improves the existing best result by a factor of $O(ε^{-\frac{1}{2}})$. Extensive experimental results on the structure adversarial attack on black-box deep neural networks demonstrate the efficiency of our new algorithms. △ Less

Submitted 11 December, 2023; v1 submitted 29 July, 2019; originally announced July 2019.

Comments: This paper was accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence

arXiv:1905.12729 [pdf, other]

Zeroth-Order Stochastic Alternating Direction Method of Multipliers for Nonconvex Nonsmooth Optimization

Authors: Feihu Huang, Shangqian Gao, Songcan Chen, Heng Huang

Abstract: Alternating direction method of multipliers (ADMM) is a popular optimization tool for the composite and constrained problems in machine learning. However, in many machine learning problems such as black-box attacks and bandit feedback, ADMM could fail because the explicit gradients of these problems are difficult or infeasible to obtain. Zeroth-order (gradient-free) methods can effectively solve t… ▽ More Alternating direction method of multipliers (ADMM) is a popular optimization tool for the composite and constrained problems in machine learning. However, in many machine learning problems such as black-box attacks and bandit feedback, ADMM could fail because the explicit gradients of these problems are difficult or infeasible to obtain. Zeroth-order (gradient-free) methods can effectively solve these problems due to that the objective function values are only required in the optimization. Recently, though there exist a few zeroth-order ADMM methods, they build on the convexity of objective function. Clearly, these existing zeroth-order methods are limited in many applications. In the paper, thus, we propose a class of fast zeroth-order stochastic ADMM methods (i.e., ZO-SVRG-ADMM and ZO-SAGA-ADMM) for solving nonconvex problems with multiple nonsmooth penalties, based on the coordinate smoothing gradient estimator. Moreover, we prove that both the ZO-SVRG-ADMM and ZO-SAGA-ADMM have convergence rate of $O(1/T)$, where $T$ denotes the number of iterations. In particular, our methods not only reach the best convergence rate $O(1/T)$ for the nonconvex optimization, but also are able to effectively solve many complex machine learning problems with multiple regularized penalties and constraints. Finally, we conduct the experiments of black-box binary classification and structured adversarial attack on black-box deep neural network to validate the efficiency of our algorithms. △ Less

Submitted 29 July, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

Comments: To Appear in IJCAI 2019. Supplementary materials are added

arXiv:1905.09993 [pdf, other]

Inference of Dynamic Graph Changes for Functional Connectome

Authors: Dingjue Ji, Junwei Lu, Yiliang Zhang, Hongyu Zhao, Siyuan Gao

Abstract: Dynamic functional connectivity is an effective measure for the brain's responses to continuous stimuli. We propose an inferential method to detect the dynamic changes of brain networks based on time-varying graphical models. Whereas most existing methods focus on testing the existence of change points, the dynamics in the brain network offer more signals in many neuroscience studies. We propose a… ▽ More Dynamic functional connectivity is an effective measure for the brain's responses to continuous stimuli. We propose an inferential method to detect the dynamic changes of brain networks based on time-varying graphical models. Whereas most existing methods focus on testing the existence of change points, the dynamics in the brain network offer more signals in many neuroscience studies. We propose a novel method to conduct hypothesis testing on changes in dynamic brain networks. We introduce a bootstrap statistic to approximate the supreme of the high-dimensional empirical processes over dynamically changing edges. Our simulations show that this framework can capture the change points with changed connectivity. Finally, we apply our method to a brain imaging dataset under a natural audio-video stimulus and illustrate that we are able to detect temporal changes in brain networks. The functions of the identified regions are consistent with specific emotional annotations, which are closely associated with changes inferred by our method. △ Less

Submitted 19 June, 2020; v1 submitted 23 May, 2019; originally announced May 2019.

Journal ref: International Conference on Artificial Intelligence and Statistics, 26-28 August 2020, Online, PMLR 108:3230-3240

arXiv:1905.04094 [pdf, other]

Domain Adversarial Reinforcement Learning for Partial Domain Adaptation

Authors: ** Chen, Xinxiao Wu, Lixin Duan, Shenghua Gao

Abstract: Partial domain adaptation aims to transfer knowledge from a label-rich source domain to a label-scarce target domain which relaxes the fully shared label space assumption across different domains. In this more general and practical scenario, a major challenge is how to select source instances in the shared classes across different domains for positive transfer. To address this issue, we propose a… ▽ More Partial domain adaptation aims to transfer knowledge from a label-rich source domain to a label-scarce target domain which relaxes the fully shared label space assumption across different domains. In this more general and practical scenario, a major challenge is how to select source instances in the shared classes across different domains for positive transfer. To address this issue, we propose a Domain Adversarial Reinforcement Learning (DARL) framework to automatically select source instances in the shared classes for circumventing negative transfer as well as to simultaneously learn transferable features between domains by reducing the domain shift. Specifically, in this framework, we employ deep Q-learning to learn policies for an agent to make selection decisions by approximating the action-value function. Moreover, domain adversarial learning is introduced to learn domain-invariant features for the selected source instances by the agent and the target instances, and also to determine rewards for the agent based on how relevant the selected source instances are to the target domain. Experiments on several benchmark datasets demonstrate that the superior performance of our DARL method over existing state of the arts for partial domain adaptation. △ Less

Submitted 10 May, 2019; originally announced May 2019.

arXiv:1904.12604 [pdf, other]

Pre-training of Context-aware Item Representation for Next Basket Recommendation

Authors: **gxuan Yang, Jun Xu, Jianzhuo Tong, Sheng Gao, Jun Guo, Jirong Wen

Abstract: Next basket recommendation, which aims to predict the next a few items that a user most probably purchases given his historical transactions, plays a vital role in market basket analysis. From the viewpoint of item, an item could be purchased by different users together with different items, for different reasons. Therefore, an ideal recommender system should represent an item considering its tran… ▽ More Next basket recommendation, which aims to predict the next a few items that a user most probably purchases given his historical transactions, plays a vital role in market basket analysis. From the viewpoint of item, an item could be purchased by different users together with different items, for different reasons. Therefore, an ideal recommender system should represent an item considering its transaction contexts. Existing state-of-the-art deep learning methods usually adopt the static item representations, which are invariant among all of the transactions and thus cannot achieve the full potentials of deep learning. Inspired by the pre-trained representations of BERT in natural language processing, we propose to conduct context-aware item representation for next basket recommendation, called Item Encoder Representations from Transformers (IERT). In the offline phase, IERT pre-trains deep item representations conditioning on their transaction contexts. In the online recommendation phase, the pre-trained model is further fine-tuned with an additional output layer. The output contextualized item embeddings are used to capture users' sequential behaviors and general tastes to conduct recommendation. Experimental results on the Ta-Feng data set show that IERT outperforms the state-of-the-art baseline methods, which demonstrated the effectiveness of IERT in next basket representation. △ Less

Submitted 14 April, 2019; originally announced April 2019.

arXiv:1904.10639 [pdf, other]

Efficient Simulation Budget Allocation for Subset Selection Using Regression Metamodels

Authors: Fei Gao, Zhongshun Shi, Siyang Gao, Hui Xiao

Abstract: This research considers the ranking and selection (R&S) problem of selecting the optimal subset from a finite set of alternative designs. Given the total simulation budget constraint, we aim to maximize the probability of correctly selecting the top-m designs. In order to improve the selection efficiency, we incorporate the information from across the domain into regression metamodels. In this res… ▽ More This research considers the ranking and selection (R&S) problem of selecting the optimal subset from a finite set of alternative designs. Given the total simulation budget constraint, we aim to maximize the probability of correctly selecting the top-m designs. In order to improve the selection efficiency, we incorporate the information from across the domain into regression metamodels. In this research, we assume that the mean performance of each design is approximately quadratic. To achieve a better fit of this model, we divide the solution space into adjacent partitions such that the quadratic assumption can be satisfied within each partition. Using the large deviation theory, we propose an approximately optimal simulation budget allocation rule in the presence of partitioned domains. Numerical experiments demonstrate that our approach can enhance the simulation efficiency significantly. △ Less

Submitted 24 April, 2019; originally announced April 2019.

arXiv:1903.11774 [pdf, ps, other]

How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?

Authors: Quan Vuong, Sharad Vikram, Hao Su, Sicun Gao, Henrik I. Christensen

Abstract: Recently, reinforcement learning (RL) algorithms have demonstrated remarkable success in learning complicated behaviors from minimally processed input. However, most of this success is limited to simulation. While there are promising successes in applying RL algorithms directly on real systems, their performance on more complex systems remains bottle-necked by the relative data inefficiency of RL… ▽ More Recently, reinforcement learning (RL) algorithms have demonstrated remarkable success in learning complicated behaviors from minimally processed input. However, most of this success is limited to simulation. While there are promising successes in applying RL algorithms directly on real systems, their performance on more complex systems remains bottle-necked by the relative data inefficiency of RL algorithms. Domain randomization is a promising direction of research that has demonstrated impressive results using RL algorithms to control real robots. At a high level, domain randomization works by training a policy on a distribution of environmental conditions in simulation. If the environments are diverse enough, then the policy trained on this distribution will plausibly generalize to the real world. A human-specified design choice in domain randomization is the form and parameters of the distribution of simulated environments. It is unclear how to the best pick the form and parameters of this distribution and prior work uses hand-tuned distributions. This extended abstract demonstrates that the choice of the distribution plays a major role in the performance of the trained policies in the real world and that the parameter of this distribution can be optimized to maximize the performance of the trained policies in the real world △ Less

Submitted 27 March, 2019; originally announced March 2019.

Comments: 2-page extended abstract

arXiv:1808.08149 [pdf, other]

From Random to Supervised: A Novel Dropout Mechanism Integrated with Global Information

Authors: Hengru Xu, Shen Li, Renfen Hu, Si Li, Sheng Gao

Abstract: Dropout is used to avoid overfitting by randomly drop** units from the neural networks during training. Inspired by dropout, this paper presents GI-Dropout, a novel dropout method integrating with global information to improve neural networks for text classification. Unlike the traditional dropout method in which the units are dropped randomly according to the same probability, we aim to use exp… ▽ More Dropout is used to avoid overfitting by randomly drop** units from the neural networks during training. Inspired by dropout, this paper presents GI-Dropout, a novel dropout method integrating with global information to improve neural networks for text classification. Unlike the traditional dropout method in which the units are dropped randomly according to the same probability, we aim to use explicit instructions based on global information of the dataset to guide the training process. With GI-Dropout, the model is supposed to pay more attention to inapparent features or patterns. Experiments demonstrate the effectiveness of the dropout with global information on seven text classification tasks, including sentiment analysis and topic classification. △ Less

Submitted 10 October, 2018; v1 submitted 24 August, 2018; originally announced August 2018.

arXiv:1805.09458 [pdf, other]

Invariant Representations without Adversarial Training

Authors: Daniel Moyer, Shuyang Gao, Rob Brekelmans, Greg Ver Steeg, Aram Galstyan

Abstract: Representations of data that are invariant to changes in specified factors are useful for a wide range of problems: removing potential biases in prediction problems, controlling the effects of covariates, and disentangling meaningful factors of variation. Unfortunately, learning representations that exhibit invariance to arbitrary nuisance factors yet remain useful for other tasks is challenging.… ▽ More Representations of data that are invariant to changes in specified factors are useful for a wide range of problems: removing potential biases in prediction problems, controlling the effects of covariates, and disentangling meaningful factors of variation. Unfortunately, learning representations that exhibit invariance to arbitrary nuisance factors yet remain useful for other tasks is challenging. Existing approaches cast the trade-off between task performance and invariance in an adversarial way, using an iterative minimax optimization. We show that adversarial training is unnecessary and sometimes counter-productive; we instead cast invariant representation learning as a single information-theoretic objective that can be directly optimized. We demonstrate that this approach matches or exceeds performance of state-of-the-art adversarial approaches for learning fair representations and for generative modeling with controllable transformations. △ Less

Submitted 2 December, 2019; v1 submitted 23 May, 2018; originally announced May 2018.

Comments: NeurIPS 2018, with corrections

arXiv:1804.10188 [pdf, other]

Modeling Psychotherapy Dialogues with Kernelized Hashcode Representations: A Nonparametric Information-Theoretic Approach

Authors: Sahil Garg, Irina Rish, Guillermo Cecchi, Palash Goyal, Sarik Ghazarian, Shuyang Gao, Greg Ver Steeg, Aram Galstyan

Abstract: We propose a novel dialogue modeling framework, the first-ever nonparametric kernel functions based approach for dialogue modeling, which learns kernelized hashcodes as compressed text representations; unlike traditional deep learning models, it handles well relatively small datasets, while also scaling to large ones. We also derive a novel lower bound on mutual information, used as a model-select… ▽ More We propose a novel dialogue modeling framework, the first-ever nonparametric kernel functions based approach for dialogue modeling, which learns kernelized hashcodes as compressed text representations; unlike traditional deep learning models, it handles well relatively small datasets, while also scaling to large ones. We also derive a novel lower bound on mutual information, used as a model-selection criterion favoring representations with better alignment between the utterances of participants in a collaborative dialogue setting, as well as higher predictability of the generated responses. As demonstrated on three real-life datasets, including prominently psychotherapy sessions, the proposed approach significantly outperforms several state-of-art neural network based dialogue systems, both in terms of computational efficiency, reducing training time from days or weeks to hours, and the response quality, achieving an order of magnitude improvement over competitors in frequency of being chosen as the best model by human evaluators. △ Less

Submitted 9 September, 2019; v1 submitted 26 April, 2018; originally announced April 2018.

Comments: Response generative based model added, along with human evaluation

arXiv:1802.05822 [pdf, other]

Auto-Encoding Total Correlation Explanation

Authors: Shuyang Gao, Rob Brekelmans, Greg Ver Steeg, Aram Galstyan

Abstract: Advances in unsupervised learning enable reconstruction and generation of samples from complex distributions, but this success is marred by the inscrutability of the representations learned. We propose an information-theoretic approach to characterizing disentanglement and dependence in representation learning using multivariate mutual information, also called total correlation. The principle of t… ▽ More Advances in unsupervised learning enable reconstruction and generation of samples from complex distributions, but this success is marred by the inscrutability of the representations learned. We propose an information-theoretic approach to characterizing disentanglement and dependence in representation learning using multivariate mutual information, also called total correlation. The principle of total Cor-relation Ex-planation (CorEx) has motivated successful unsupervised learning applications across a variety of domains, but under some restrictive assumptions. Here we relax those restrictions by introducing a flexible variational lower bound to CorEx. Surprisingly, we find that this lower bound is equivalent to the one in variational autoencoders (VAE) under certain conditions. This information-theoretic view of VAE deepens our understanding of hierarchical VAE and motivates a new algorithm, AnchorVAE, that makes latent codes more interpretable through information maximization and enables generation of richer and more realistic samples. △ Less

Submitted 15 February, 2018; originally announced February 2018.

arXiv:1711.01577 [pdf, other]

Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning

Authors: Zhen He, Shaobing Gao, Liang Xiao, Daxue Liu, Hangen He, David Barber

Abstract: Long Short-Term Memory (LSTM) is a popular approach to boosting the ability of Recurrent Neural Networks to store longer term temporal information. The capacity of an LSTM network can be increased by widening and adding layers. However, usually the former introduces additional parameters, while the latter increases the runtime. As an alternative we propose the Tensorized LSTM in which the hidden s… ▽ More Long Short-Term Memory (LSTM) is a popular approach to boosting the ability of Recurrent Neural Networks to store longer term temporal information. The capacity of an LSTM network can be increased by widening and adding layers. However, usually the former introduces additional parameters, while the latter increases the runtime. As an alternative we propose the Tensorized LSTM in which the hidden states are represented by tensors and updated via a cross-layer convolution. By increasing the tensor size, the network can be widened efficiently without additional parameters since the parameters are shared across different locations in the tensor; by delaying the output, the network can be deepened implicitly with little additional runtime since deep computations for each timestep are merged into temporal computations of the sequence. Experiments conducted on five challenging sequence learning tasks show the potential of the proposed model. △ Less

Submitted 12 December, 2017; v1 submitted 5 November, 2017; originally announced November 2017.

Comments: Accepted by NIPS 2017

arXiv:1606.02827 [pdf, other]

Variational Information Maximization for Feature Selection

Authors: Shuyang Gao, Greg Ver Steeg, Aram Galstyan

Abstract: Feature selection is one of the most fundamental problems in machine learning. An extensive body of work on information-theoretic feature selection exists which is based on maximizing mutual information between subsets of features and class labels. Practical methods are forced to rely on approximations due to the difficulty of estimating mutual information. We demonstrate that approximations made… ▽ More Feature selection is one of the most fundamental problems in machine learning. An extensive body of work on information-theoretic feature selection exists which is based on maximizing mutual information between subsets of features and class labels. Practical methods are forced to rely on approximations due to the difficulty of estimating mutual information. We demonstrate that approximations made by existing methods are based on unrealistic assumptions. We formulate a more flexible and general class of assumptions based on variational distributions and use them to tractably generate lower bounds for mutual information. These bounds define a novel information-theoretic framework for feature selection, which we prove to be optimal under tree graphical models with proper choice of variational distributions. Our experiments demonstrate that the proposed method strongly outperforms existing information-theoretic feature selection approaches. △ Less

Submitted 9 June, 2016; originally announced June 2016.

Comments: 15 pages, 9 figures

arXiv:1606.02307 [pdf, other]

Sifting Common Information from Many Variables

Authors: Greg Ver Steeg, Shuyang Gao, Kyle Reing, Aram Galstyan

Abstract: Measuring the relationship between any pair of variables is a rich and active area of research that is central to scientific practice. In contrast, characterizing the common information among any group of variables is typically a theoretical exercise with few practical methods for high-dimensional data. A promising solution would be a multivariate generalization of the famous Wyner common informat… ▽ More Measuring the relationship between any pair of variables is a rich and active area of research that is central to scientific practice. In contrast, characterizing the common information among any group of variables is typically a theoretical exercise with few practical methods for high-dimensional data. A promising solution would be a multivariate generalization of the famous Wyner common information, but this approach relies on solving an apparently intractable optimization problem. We leverage the recently introduced information sieve decomposition to formulate an incremental version of the common information problem that admits a simple fixed point solution, fast convergence, and complexity that is linear in the number of variables. This scalable approach allows us to demonstrate the usefulness of common information in high-dimensional learning problems. The sieve outperforms standard methods on dimensionality reduction tasks, solves a blind source separation problem that cannot be solved with ICA, and accurately recovers structure in brain imaging data. △ Less

Submitted 16 June, 2017; v1 submitted 7 June, 2016; originally announced June 2016.

Comments: In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI-17). 8 pages, 7 figures. v4: Typos

arXiv:1411.2003 [pdf, other]

Efficient Estimation of Mutual Information for Strongly Dependent Variables

Authors: Shuyang Gao, Greg Ver Steeg, Aram Galstyan

Abstract: We demonstrate that a popular class of nonparametric mutual information (MI) estimators based on k-nearest-neighbor graphs requires number of samples that scales exponentially with the true MI. Consequently, accurate estimation of MI between two strongly dependent variables is possible only for prohibitively large sample size. This important yet overlooked shortcoming of the existing estimators is… ▽ More We demonstrate that a popular class of nonparametric mutual information (MI) estimators based on k-nearest-neighbor graphs requires number of samples that scales exponentially with the true MI. Consequently, accurate estimation of MI between two strongly dependent variables is possible only for prohibitively large sample size. This important yet overlooked shortcoming of the existing estimators is due to their implicit reliance on local uniformity of the underlying joint distribution. We introduce a new estimator that is robust to local non-uniformity, works well with limited data, and is able to capture relationship strengths over many orders of magnitude. We demonstrate the superior performance of the proposed estimator on both synthetic and real-world data. △ Less

Submitted 5 March, 2015; v1 submitted 7 November, 2014; originally announced November 2014.

Comments: 13 pages, to appear in International Conference on Artificial Intelligence and Statistics (AISTATS) 2015

arXiv:1409.6805 [pdf, other]

Improving Cross-domain Recommendation through Probabilistic Cluster-level Latent Factor Model--Extended Version

Authors: Siting Ren, Sheng Gao

Abstract: Cross-domain recommendation has been proposed to transfer user behavior pattern by pooling together the rating data from multiple domains to alleviate the sparsity problem appearing in single rating domains. However, previous models only assume that multiple domains share a latent common rating pattern based on the user-item co-clustering. To capture diversities among different domains, we propose… ▽ More Cross-domain recommendation has been proposed to transfer user behavior pattern by pooling together the rating data from multiple domains to alleviate the sparsity problem appearing in single rating domains. However, previous models only assume that multiple domains share a latent common rating pattern based on the user-item co-clustering. To capture diversities among different domains, we propose a novel Probabilistic Cluster-level Latent Factor (PCLF) model to improve the cross-domain recommendation performance. Experiments on several real world datasets demonstrate that our proposed model outperforms the state-of-the-art methods for the cross-domain recommendation task. △ Less

Submitted 23 September, 2014; originally announced September 2014.

arXiv:1204.2588 [pdf, other]

Probabilistic Latent Tensor Factorization Model for Link Pattern Prediction in Multi-relational Networks

Authors: Sheng Gao, Ludovic Denoyer, Patrick Gallinari

Abstract: This paper aims at the problem of link pattern prediction in collections of objects connected by multiple relation types, where each type may play a distinct role. While common link analysis models are limited to single-type link prediction, we attempt here to capture the correlations among different relation types and reveal the impact of various relation types on performance quality. For that, w… ▽ More This paper aims at the problem of link pattern prediction in collections of objects connected by multiple relation types, where each type may play a distinct role. While common link analysis models are limited to single-type link prediction, we attempt here to capture the correlations among different relation types and reveal the impact of various relation types on performance quality. For that, we define the overall relations between object pairs as a \textit{link pattern} which consists in interaction pattern and connection structure in the network, and then use tensor formalization to jointly model and predict the link patterns, which we refer to as \textit{Link Pattern Prediction} (LPP) problem. To address the issue, we propose a Probabilistic Latent Tensor Factorization (PLTF) model by introducing another latent factor for multiple relation types and furnish the Hierarchical Bayesian treatment of the proposed probabilistic model to avoid overfitting for solving the LPP problem. To learn the proposed model we develop an efficient Markov Chain Monte Carlo sampling method. Extensive experiments are conducted on several real world datasets and demonstrate significant improvements over several existing state-of-the-art methods. △ Less

Submitted 11 April, 2012; originally announced April 2012.

Comments: 19pages, 5 figures

MSC Class: 15A69 ACM Class: H.2.8; J.4

arXiv:1204.2581 [pdf, other]

Modeling Relational Data via Latent Factor Blockmodel

Authors: Sheng Gao, Ludovic Denoyer, Patrick Gallinari

Abstract: In this paper we address the problem of modeling relational data, which appear in many applications such as social network analysis, recommender systems and bioinformatics. Previous studies either consider latent feature based models but disregarding local structure in the network, or focus exclusively on capturing local structure of objects based on latent blockmodels without coupling with latent… ▽ More In this paper we address the problem of modeling relational data, which appear in many applications such as social network analysis, recommender systems and bioinformatics. Previous studies either consider latent feature based models but disregarding local structure in the network, or focus exclusively on capturing local structure of objects based on latent blockmodels without coupling with latent characteristics of objects. To combine the benefits of the previous work, we propose a novel model that can simultaneously incorporate the effect of latent features and covariates if any, as well as the effect of latent structure that may exist in the data. To achieve this, we model the relation graph as a function of both latent feature factors and latent cluster memberships of objects to collectively discover globally predictive intrinsic properties of objects and capture latent block structure in the network to improve prediction performance. We also develop an optimization transfer algorithm based on the generalized EM-style strategy to learn the latent factors. We prove the efficacy of our proposed model through the link prediction task and cluster analysis task, and extensive experiments on the synthetic data and several real world datasets suggest that our proposed LFBM model outperforms the other state of the art approaches in the evaluated tasks. △ Less

Submitted 11 April, 2012; originally announced April 2012.

Comments: 10 pages, 12 figures

MSC Class: 15A83 ACM Class: H.2.8; J.4

arXiv:0809.4627 [pdf, ps, other]

doi 10.1007/s11786-011-0068-3

Solving the 100 Swiss Francs Problem

Authors: Mingfu Zhu, Guangran Jiang, Shuhong Gao

Abstract: Sturmfels offered 100 Swiss Francs in 2005 to a conjecture, which deals with a special case of the maximum likelihood estimation for a latent class model. This paper confirms the conjecture positively. Sturmfels offered 100 Swiss Francs in 2005 to a conjecture, which deals with a special case of the maximum likelihood estimation for a latent class model. This paper confirms the conjecture positively. △ Less

Submitted 27 August, 2011; v1 submitted 26 September, 2008; originally announced September 2008.

MSC Class: 65H10 (Primary); 62P10; 62F30 (Secondary)

Showing 1–40 of 40 results for author: Gao, S