-
Non-linear Mendelian randomization with Two-stage prediction estimation and Control function estimation
Authors:
Xinpei Wang,
Tao Huang,
**zhu Jia
Abstract:
Most of the existing Mendelian randomization (MR) methods are limited by the assumption of linear causality between exposure and outcome, and the development of new non-linear MR methods is highly desirable. We introduce two-stage prediction estimation and control function estimation from econometrics to MR and extend them to non-linear causality. We give conditions for parameter identification an…
▽ More
Most of the existing Mendelian randomization (MR) methods are limited by the assumption of linear causality between exposure and outcome, and the development of new non-linear MR methods is highly desirable. We introduce two-stage prediction estimation and control function estimation from econometrics to MR and extend them to non-linear causality. We give conditions for parameter identification and theoretically prove the consistency and asymptotic normality of the estimates. We compare the two methods theoretically under both linear and non-linear causality. We also extend the control function estimation to a more flexible semi-parametric framework without detailed parametric specifications of causality. Extensive simulations numerically corroborate our theoretical results. Application to UK Biobank data reveals non-linear causal relationships between sleep duration and systolic/diastolic blood pressure.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Geometry-Aware Adaptation for Pretrained Models
Authors:
Nicholas Roberts,
Xintong Li,
Dyah Adila,
Sonia Cromp,
Tzu-Heng Huang,
Jitian Zhao,
Frederic Sala
Abstract:
Machine learning models -- including prominent zero-shot models -- are often trained on datasets whose labels are only a small proportion of a larger label space. Such spaces are commonly equipped with a metric that relates the labels via distances between them. We propose a simple approach to exploit this information to adapt the trained model to reliably predict new classes -- or, in the case of…
▽ More
Machine learning models -- including prominent zero-shot models -- are often trained on datasets whose labels are only a small proportion of a larger label space. Such spaces are commonly equipped with a metric that relates the labels via distances between them. We propose a simple approach to exploit this information to adapt the trained model to reliably predict new classes -- or, in the case of zero-shot prediction, to improve its performance -- without any additional training. Our technique is a drop-in replacement of the standard prediction rule, swap** argmax with the Fréchet mean. We provide a comprehensive theoretical analysis for this approach, studying (i) learning-theoretic results trading off label space diameter, sample complexity, and model dimension, (ii) characterizations of the full range of scenarios in which it is possible to predict any unobserved class, and (iii) an optimal active learning-like next class selection procedure to obtain optimal training classes for when it is not possible to predict the entire range of unobserved classes. Empirically, using easily-available external metrics, our proposed approach, Loki, gains up to 29.7% relative improvement over SimCLR on ImageNet and scales to hundreds of thousands of classes. When no such metric is available, Loki can use self-derived metrics from class embeddings and obtains a 10.5% improvement on pretrained zero-shot models such as CLIP.
△ Less
Submitted 27 November, 2023; v1 submitted 23 July, 2023;
originally announced July 2023.
-
Optimal Differentially Private Model Training with Public Data
Authors:
Andrew Lowy,
Zeman Li,
Tianjian Huang,
Meisam Razaviyayn
Abstract:
Differential privacy (DP) ensures that training a machine learning model does not leak private data. In practice, we may have access to auxiliary public data that is free of privacy concerns. In this work, we assume access to a given amount of public data and settle the following fundamental open questions: 1. What is the optimal (worst-case) error of a DP model trained over a private data set whi…
▽ More
Differential privacy (DP) ensures that training a machine learning model does not leak private data. In practice, we may have access to auxiliary public data that is free of privacy concerns. In this work, we assume access to a given amount of public data and settle the following fundamental open questions: 1. What is the optimal (worst-case) error of a DP model trained over a private data set while having access to side public data? 2. How can we harness public data to improve DP model training in practice? We consider these questions in both the local and central models of pure and approximate DP. To answer the first question, we prove tight (up to log factors) lower and upper bounds that characterize the optimal error rates of three fundamental problems: mean estimation, empirical risk minimization, and stochastic convex optimization. We show that the optimal error rates can be attained (up to log factors) by either discarding private data and training a public model, or treating public data like it is private and using an optimal DP algorithm. To address the second question, we develop novel algorithms that are "even more optimal" (i.e. better constants) than the asymptotically optimal approaches described above. For local DP mean estimation, our algorithm is \ul{optimal including constants}. Empirically, our algorithms show benefits over the state-of-the-art.
△ Less
Submitted 13 February, 2024; v1 submitted 26 June, 2023;
originally announced June 2023.
-
A mixture logistic model for panel data with a Markov structure
Authors:
Yu-Hsiang Cheng,
Tzee-Ming Huang
Abstract:
In this study, we propose a mixture logistic regression model with a Markov structure, and consider the estimation of model parameters using maximum likelihood estimation. We also provide a forward type variable selection algorithm to choose the important explanatory variables to reduce the number of parameters in the proposed model.
In this study, we propose a mixture logistic regression model with a Markov structure, and consider the estimation of model parameters using maximum likelihood estimation. We also provide a forward type variable selection algorithm to choose the important explanatory variables to reduce the number of parameters in the proposed model.
△ Less
Submitted 24 July, 2023; v1 submitted 3 February, 2023;
originally announced February 2023.
-
A Constrained Spatial Autoregressive Model for Interval-valued data
Authors:
Tingting Huang
Abstract:
Interval-valued data receives much attention due to its wide applications in the fields of finance, econometrics, meteorology and medicine. However, most regression models developed for interval-valued data assume observations are mutually independent, not adapted to the scenario that individuals are spatially correlated. We propose a new linear model to accommodate to areal-type spatial dependenc…
▽ More
Interval-valued data receives much attention due to its wide applications in the fields of finance, econometrics, meteorology and medicine. However, most regression models developed for interval-valued data assume observations are mutually independent, not adapted to the scenario that individuals are spatially correlated. We propose a new linear model to accommodate to areal-type spatial dependency existed in interval-valued data. Specifically, spatial correlation among centers of responses are considered. To improve the new model's prediction accuracy, we add three inequality constrains. Parameters are obtained by an algorithm combining grid search technique and the constrained least squares method. Numerical experiments are designed to examine prediction performances of the proposed model. We also employ a weather dataset to demonstrate usefulness of our model.
△ Less
Submitted 27 October, 2022;
originally announced October 2022.
-
AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels
Authors:
Nicholas Roberts,
Xintong Li,
Tzu-Heng Huang,
Dyah Adila,
Spencer Schoenberg,
Cheng-Yu Liu,
Lauren Pick,
Haotian Ma,
Aws Albarghouthi,
Frederic Sala
Abstract:
Weak supervision (WS) is a powerful method to build labeled datasets for training supervised models in the face of little-to-no labeled data. It replaces hand-labeling data with aggregating multiple noisy-but-cheap label estimates expressed by labeling functions (LFs). While it has been used successfully in many domains, weak supervision's application scope is limited by the difficulty of construc…
▽ More
Weak supervision (WS) is a powerful method to build labeled datasets for training supervised models in the face of little-to-no labeled data. It replaces hand-labeling data with aggregating multiple noisy-but-cheap label estimates expressed by labeling functions (LFs). While it has been used successfully in many domains, weak supervision's application scope is limited by the difficulty of constructing labeling functions for domains with complex or high-dimensional features. To address this, a handful of methods have proposed automating the LF design process using a small set of ground truth labels. In this work, we introduce AutoWS-Bench-101: a framework for evaluating automated WS (AutoWS) techniques in challenging WS settings -- a set of diverse application domains on which it has been previously difficult or impossible to apply traditional WS techniques. While AutoWS is a promising direction toward expanding the application-scope of WS, the emergence of powerful methods such as zero-shot foundation models reveals the need to understand how AutoWS techniques compare or cooperate with modern zero-shot or few-shot learners. This informs the central question of AutoWS-Bench-101: given an initial set of 100 labels for each task, we ask whether a practitioner should use an AutoWS method to generate additional labels or use some simpler baseline, such as zero-shot predictions from a foundation model or supervised learning. We observe that in many settings, it is necessary for AutoWS methods to incorporate signal from foundation models if they are to outperform simple few-shot baselines, and AutoWS-Bench-101 promotes future research in this direction. We conclude with a thorough ablation study of AutoWS methods.
△ Less
Submitted 24 November, 2023; v1 submitted 30 August, 2022;
originally announced August 2022.
-
Eddy Covariance: A Scientometric Review (1981-2018)
Authors:
Tian-Yuan Huang,
Yi-Fei Liu,
Yuan-Chen Wang,
Hai-Qing Guo,
Jun Ma,
Bin Zhao
Abstract:
The history of eddy covariance (EC) measuring system could be dated back to 100 years ago, but it was not until the recent decades that EC gains popularity and being widely used in global change ecological studies, with explosion of related work published in papers from various journals. Investigating 8297 literature related with EC from 1981 to 2018, we make a comprehensive and critical review of…
▽ More
The history of eddy covariance (EC) measuring system could be dated back to 100 years ago, but it was not until the recent decades that EC gains popularity and being widely used in global change ecological studies, with explosion of related work published in papers from various journals. Investigating 8297 literature related with EC from 1981 to 2018, we make a comprehensive and critical review of scientific development of EC from a scientometric perspective. First, the paper outlines general bibliometric statistics, including publication number, country contribution, productive institutions, active authors, journal distribution, highly cited articles and fund support, to provide an informative picture of EC studies. Second, research trends are revealed by network visualization and modeling based on keyword analysis, from where we could discover the knowledge structure of EC and detect the research focus and hotspots transitions at different periods. Third, collaboration in EC research community has been explored. FLUXNET is the largest global network uniting EC researchers, here we have quantified and evaluated its performance by using bibliometric indicators of cooperation and citation. Specific discussions have been given to the historical development of EC, including technical maturation and application promotion. Considering the current barrier for collaboration, the review closes by analyzing the reasons hindering data sharing and makes a prospect of new models for data-intensive collaboration in the future.
△ Less
Submitted 4 April, 2022; v1 submitted 3 April, 2022;
originally announced April 2022.
-
A Flexible and Parsimonious Modelling Strategy for Clustered Data Analysis
Authors:
Tao Huang,
Youquan Pei,
**hong You,
Wenyang Zhang
Abstract:
Statistical modelling strategy is the key for success in data analysis. The trade-off between flexibility and parsimony plays a vital role in statistical modelling. In clustered data analysis, in order to account for the heterogeneity between the clusters, certain flexibility is necessary in the modelling, yet parsimony is also needed to guard against the complexity and account for the homogeneity…
▽ More
Statistical modelling strategy is the key for success in data analysis. The trade-off between flexibility and parsimony plays a vital role in statistical modelling. In clustered data analysis, in order to account for the heterogeneity between the clusters, certain flexibility is necessary in the modelling, yet parsimony is also needed to guard against the complexity and account for the homogeneity among the clusters. In this paper, we propose a flexible and parsimonious modelling strategy for clustered data analysis. The strategy strikes a nice balance between flexibility and parsimony, and accounts for both heterogeneity and homogeneity well among the clusters, which often come with strong practical meanings. In fact, its usefulness has gone beyond clustered data analysis, it also sheds promising lights on transfer learning. An estimation procedure is developed for the unknowns in the resulting model, and asymptotic properties of the estimators are established. Intensive simulation studies are conducted to demonstrate how well the proposed methods work, and a real data analysis is also presented to illustrate how to apply the modelling strategy and associated estimation procedure to answer some real problems arising from real life.
△ Less
Submitted 16 February, 2023; v1 submitted 1 April, 2022;
originally announced April 2022.
-
An empirical exploration of the diversified R ecosystem
Authors:
Tian-Yuan Huang,
Zhilan Lou
Abstract:
Born in the late 20s, R is one of the most popular software for statistical computing and graphics. With the development of information technology and the advent of the big data era, great changes have taken place in the R ecosystem. Based on the meta information of the Comprehensive R Archive Network (CRAN) and the bibliometric data of literature citing R, we discovered that while R is initiated…
▽ More
Born in the late 20s, R is one of the most popular software for statistical computing and graphics. With the development of information technology and the advent of the big data era, great changes have taken place in the R ecosystem. Based on the meta information of the Comprehensive R Archive Network (CRAN) and the bibliometric data of literature citing R, we discovered that while R is initiated by statistics, its development is benefited greatly from computer science and the main user group in academics come from various disciplines such as agricultural science, biological science, environmental science and medical science. In addition, we displayed the collaboration patterns among R developers and analyze the possible effects of collaboration in the R community.
△ Less
Submitted 6 December, 2023; v1 submitted 13 January, 2022;
originally announced January 2022.
-
Efficient Estimation of the Maximal Association between Multiple Predictors and a Survival Outcome
Authors:
Tzu-Jung Huang,
Alex Luedtke,
Ian W. McKeague
Abstract:
This paper develops a new approach to post-selection inference for screening high-dimensional predictors of survival outcomes. Post-selection inference for right-censored outcome data has been investigated in the literature, but much remains to be done to make the methods both reliable and computationally-scalable in high-dimensions. Machine learning tools are commonly used to provide {\it predict…
▽ More
This paper develops a new approach to post-selection inference for screening high-dimensional predictors of survival outcomes. Post-selection inference for right-censored outcome data has been investigated in the literature, but much remains to be done to make the methods both reliable and computationally-scalable in high-dimensions. Machine learning tools are commonly used to provide {\it predictions} of survival outcomes, but the estimated effect of a selected predictor suffers from confirmation bias unless the selection is taken into account. The new approach involves construction of semi-parametrically efficient estimators of the linear association between the predictors and the survival outcome, which are used to build a test statistic for detecting the presence of an association between any of the predictors and the outcome. Further, a stabilization technique reminiscent of bagging allows a normal calibration for the resulting test statistic, which enables the construction of confidence intervals for the maximal association between predictors and the outcome and also greatly reduces computational cost. Theoretical results show that this testing procedure is valid even when the number of predictors grows superpolynomially with sample size, and our simulations support that this asymptotic guarantee is indicative the performance of the test at moderate sample sizes. The new approach is applied to the problem of identifying patterns in viral gene expression associated with the potency of an antiviral drug.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
Improved Efficiency for Cross-Arm Comparisons via Platform Designs
Authors:
Tzu-Jung Huang,
Alex Luedtke,
the AMP Investigators Group
Abstract:
Though platform trials have been touted for their flexibility and streamlined use of trial resources, their statistical efficiency is not well understood. We fill this gap by establishing their greater efficiency for comparing the relative efficacy of multiple interventions over using several separate, two-arm trials, where the relative efficacy of an arbitrary pair of interventions is evaluated b…
▽ More
Though platform trials have been touted for their flexibility and streamlined use of trial resources, their statistical efficiency is not well understood. We fill this gap by establishing their greater efficiency for comparing the relative efficacy of multiple interventions over using several separate, two-arm trials, where the relative efficacy of an arbitrary pair of interventions is evaluated by contrasting their relative risks as compared to control. In theoretical and numerical studies, we demonstrate that the inference of such a contrast using data from a platform trial enjoys identical or better precision than using data from separate trials, even when the former enrolls substantially fewer participants. This benefit is attributed to the sharing of controls among interventions under contemporaneous randomization, which is a key feature of platform trials. We further provide a novel procedure for establishing the non-inferiority of a given intervention relative to the most efficacious of the other interventions under evaluation, where this procedure is adaptive in the sense that it need not be \textit{a priori} known which of these other interventions is most efficacious. Our numerical studies show that this testing procedure can attain substantially better power when the data arise from a platform trial rather than multiple separate trials. Our results are illustrated using data from two monoclonal antibody trials for the prevention of HIV.
△ Less
Submitted 26 January, 2022; v1 submitted 20 December, 2021;
originally announced December 2021.
-
SNIP: An Adaptation of Sorted Neighborhood Methods for Deduplicating Pedigree Data
Authors:
Theodore Huang,
Matthew Ploenzke,
Danielle Braun
Abstract:
Pedigree data contain family history information that is used to analyze hereditary diseases. These clinical data sets may contain duplicate records due to the same family visiting a clinic multiple times or a clinician entering multiple versions of the family for testing purposes. Inferences drawn from the data or using them for training or validation without removing the duplicates could lead to…
▽ More
Pedigree data contain family history information that is used to analyze hereditary diseases. These clinical data sets may contain duplicate records due to the same family visiting a clinic multiple times or a clinician entering multiple versions of the family for testing purposes. Inferences drawn from the data or using them for training or validation without removing the duplicates could lead to invalid conclusions, and hence identifying the duplicates is essential. Since family structures can be complex, existing deduplication algorithms cannot be applied directly. We first motivate the importance of deduplication by examining the impact of pedigree duplicates on the training and validation of a familial risk prediction model. We then introduce an unsupervised algorithm, which we call SNIP (Sorted NeIghborhood for Pedigrees), that builds on the sorted neighborhood method to efficiently find and classify pairwise comparisons by leveraging the inherent hierarchical nature of the pedigrees. We conduct a simulation study to assess the performance of the algorithm and find parameter configurations where the algorithm is able to accurately detect the duplicates. We then apply the method to data from the Risk Service, which includes over 300,000 pedigrees at high risk of hereditary cancers, and uncover large clusters of potential duplicate families. After removing 104,520 pedigrees (33% of original data), the resulting Risk Service dataset can now be used for future analysis, training, and validation. The algorithm is available as an R package snipR available at https://github.com/bayesmendel/snipR.
△ Less
Submitted 19 August, 2021;
originally announced August 2021.
-
On Generalization of Graph Autoencoders with Adversarial Training
Authors:
Tian** Huang,
Yulong Pei,
Vlado Menkovski,
Mykola Pechenizkiy
Abstract:
Adversarial training is an approach for increasing model's resilience against adversarial perturbations. Such approaches have been demonstrated to result in models with feature representations that generalize better. However, limited works have been done on adversarial training of models on graph data. In this paper, we raise such a question { does adversarial training improve the generalization o…
▽ More
Adversarial training is an approach for increasing model's resilience against adversarial perturbations. Such approaches have been demonstrated to result in models with feature representations that generalize better. However, limited works have been done on adversarial training of models on graph data. In this paper, we raise such a question { does adversarial training improve the generalization of graph representations. We formulate L2 and L1 versions of adversarial training in two powerful node embedding methods: graph autoencoder (GAE) and variational graph autoencoder (VGAE). We conduct extensive experiments on three main applications, i.e. link prediction, node clustering, graph anomaly detection of GAE and VGAE, and demonstrate that both L2 and L1 adversarial training boost the generalization of GAE and VGAE.
△ Less
Submitted 4 August, 2021; v1 submitted 6 July, 2021;
originally announced July 2021.
-
A Decentralized Adaptive Momentum Method for Solving a Class of Min-Max Optimization Problems
Authors:
Babak Barazandeh,
Tianjian Huang,
George Michailidis
Abstract:
Min-max saddle point games have recently been intensely studied, due to their wide range of applications, including training Generative Adversarial Networks (GANs). However, most of the recent efforts for solving them are limited to special regimes such as convex-concave games. Further, it is customarily assumed that the underlying optimization problem is solved either by a single machine or in th…
▽ More
Min-max saddle point games have recently been intensely studied, due to their wide range of applications, including training Generative Adversarial Networks (GANs). However, most of the recent efforts for solving them are limited to special regimes such as convex-concave games. Further, it is customarily assumed that the underlying optimization problem is solved either by a single machine or in the case of multiple machines connected in centralized fashion, wherein each one communicates with a central node. The latter approach becomes challenging, when the underlying communications network has low bandwidth. In addition, privacy considerations may dictate that certain nodes can communicate with a subset of other nodes. Hence, it is of interest to develop methods that solve min-max games in a decentralized manner. To that end, we develop a decentralized adaptive momentum (ADAM)-type algorithm for solving min-max optimization problem under the condition that the objective function satisfies a Minty Variational Inequality condition, which is a generalization to convex-concave case. The proposed method overcomes shortcomings of recent non-adaptive gradient-based decentralized algorithms for min-max optimization problems that do not perform well in practice and require careful tuning. In this paper, we obtain non-asymptotic rates of convergence of the proposed algorithm (coined DADAM$^3$) for finding a (stochastic) first-order Nash equilibrium point and subsequently evaluate its performance on training GANs. The extensive empirical evaluation shows that DADAM$^3$ outperforms recently developed methods, including decentralized optimistic stochastic gradient for solving such min-max problems.
△ Less
Submitted 28 June, 2021; v1 submitted 10 June, 2021;
originally announced June 2021.
-
Gradient Boosted Binary Histogram Ensemble for Large-scale Regression
Authors:
Hanyuan Hang,
Tao Huang,
Yuchao Cai,
Hanfang Yang,
Zhouchen Lin
Abstract:
In this paper, we propose a gradient boosting algorithm for large-scale regression problems called \textit{Gradient Boosted Binary Histogram Ensemble} (GBBHE) based on binary histogram partition and ensemble learning. From the theoretical perspective, by assuming the Hölder continuity of the target function, we establish the statistical convergence rate of GBBHE in the space $C^{0,α}$ and…
▽ More
In this paper, we propose a gradient boosting algorithm for large-scale regression problems called \textit{Gradient Boosted Binary Histogram Ensemble} (GBBHE) based on binary histogram partition and ensemble learning. From the theoretical perspective, by assuming the Hölder continuity of the target function, we establish the statistical convergence rate of GBBHE in the space $C^{0,α}$ and $C^{1,0}$, where a lower bound of the convergence rate for the base learner demonstrates the advantage of boosting. Moreover, in the space $C^{1,0}$, we prove that the number of iterations to achieve the fast convergence rate can be reduced by using ensemble regressor as the base learner, which improves the computational efficiency. In the experiments, compared with other state-of-the-art algorithms such as gradient boosted regression tree (GBRT), Breiman's forest, and kernel-based methods, our GBBHE algorithm shows promising performance with less running time on large-scale datasets.
△ Less
Submitted 3 June, 2021;
originally announced June 2021.
-
Extending Models Via Gradient Boosting: An Application to Mendelian Models
Authors:
Theodore Huang,
Gregory Idos,
Christine Hong,
Stephen Gruber,
Giovanni Parmigiani,
Danielle Braun
Abstract:
Improving existing widely-adopted prediction models is often a more efficient and robust way towards progress than training new models from scratch. Existing models may (a) incorporate complex mechanistic knowledge, (b) leverage proprietary information and, (c) have surmounted barriers to adoption. Compared to model training, model improvement and modification receive little attention. In this pap…
▽ More
Improving existing widely-adopted prediction models is often a more efficient and robust way towards progress than training new models from scratch. Existing models may (a) incorporate complex mechanistic knowledge, (b) leverage proprietary information and, (c) have surmounted barriers to adoption. Compared to model training, model improvement and modification receive little attention. In this paper we propose a general approach to model improvement: we combine gradient boosting with any previously developed model to improve model performance while retaining important existing characteristics. To exemplify, we consider the context of Mendelian models, which estimate the probability of carrying genetic mutations that confer susceptibility to disease by using family pedigrees and health histories of family members. Via simulations we show that integration of gradient boosting with an existing Mendelian model can produce an improved model that outperforms both that model and the model built using gradient boosting alone. We illustrate the approach on genetic testing data from the USC-Stanford Cancer Genetics Hereditary Cancer Panel (HCP) study.
△ Less
Submitted 13 May, 2021;
originally announced May 2021.
-
GDA-HIN: A Generalized Domain Adaptive Model across Heterogeneous Information Networks
Authors:
Tiancheng Huang,
Ke Xu,
Donglin Wang
Abstract:
Domain adaptation using graph-structured networks learns label-discriminative and network-invariant node embeddings by sharing graph parameters. Most existing works focus on domain adaptation of homogeneous networks. The few works that study heterogeneous cases only consider shared node types but ignore private node types in individual networks. However, for given source and target heterogeneous n…
▽ More
Domain adaptation using graph-structured networks learns label-discriminative and network-invariant node embeddings by sharing graph parameters. Most existing works focus on domain adaptation of homogeneous networks. The few works that study heterogeneous cases only consider shared node types but ignore private node types in individual networks. However, for given source and target heterogeneous networks, they generally contain shared and private node types, where private types bring an extra challenge for graph domain adaptation. In this paper, we investigate Heterogeneous Information Networks (HINs) with both shared and private node types and propose a Generalized Domain Adaptive model across HINs (GDA-HIN) to handle the domain shift between them. GDA-HIN can not only align the distribution of identical-type nodes and edges in two HINs but also make full use of different-type nodes and edges to improve the performance of knowledge transfer. Extensive experiments on several datasets demonstrate that GDA-HIN can outperform state-of-the-art methods in various domain adaptation tasks across heterogeneous networks.
△ Less
Submitted 25 September, 2022; v1 submitted 10 December, 2020;
originally announced December 2020.
-
condLSTM-Q: A novel deep learning model for predicting Covid-19 mortality in fine geographical Scale
Authors:
HyeongChan Jo,
Juhyun Kim,
Tzu-Chen Huang,
Yu-Li Ni
Abstract:
Predictive models with a focus on different spatial-temporal scales benefit governments and healthcare systems to combat the COVID-19 pandemic. Here we present the conditional Long Short-Term Memory networks with Quantile output (condLSTM-Q), a well-performing model for making quantile predictions on COVID-19 death tolls at the county level with a two-week forecast window. This fine geographical s…
▽ More
Predictive models with a focus on different spatial-temporal scales benefit governments and healthcare systems to combat the COVID-19 pandemic. Here we present the conditional Long Short-Term Memory networks with Quantile output (condLSTM-Q), a well-performing model for making quantile predictions on COVID-19 death tolls at the county level with a two-week forecast window. This fine geographical scale is a rare but useful feature in publicly available predictive models, which would especially benefit state-level officials to coordinate resources within the state. The quantile predictions from condLSTM-Q inform people about the distribution of the predicted death tolls, allowing better evaluation of possible trajectories of the severity. Given the scalability and generalizability of neural network models, this model could incorporate additional data sources with ease, and could be further developed to generate other useful predictions such as new cases or hospitalizations intuitively.
△ Less
Submitted 23 November, 2020;
originally announced November 2020.
-
PanelPRO: A R package for multi-syndrome, multi-gene risk modeling for individuals with a family history of cancer
Authors:
Gavin Lee,
Qing Zhang,
Jane W. Liang,
Theodore Huang,
Christine Choirat,
Giovanni Parmigiani,
Danielle Braun
Abstract:
Identifying individuals who are at high risk of cancer due to inherited germline mutations is critical for effective implementation of personalized prevention strategies. Most existing models to identify these individuals focus on specific syndromes by including family and personal history for a small number of cancers. Recent evidence from multi-gene panel testing has shown that many syndromes on…
▽ More
Identifying individuals who are at high risk of cancer due to inherited germline mutations is critical for effective implementation of personalized prevention strategies. Most existing models to identify these individuals focus on specific syndromes by including family and personal history for a small number of cancers. Recent evidence from multi-gene panel testing has shown that many syndromes once thought to be distinct are overlap**, motivating the development of models that incorporate family history information on several cancers and predict mutations for more comprehensive panels of genes.
Once such class of models are Mendelian risk prediction models, which use family history information and Mendelian laws of inheritance to estimate the probability of carrying genetic mutations, as well as future risk of develo** associated cancers. To flexibly model the complexity of many cancer-mutation associations, we present a new software tool called PanelPRO, a R package that extends the previously developed BayesMendel R package to user-selected lists of susceptibility genes and associated cancers. The model identifies individuals at an increased risk of carrying cancer susceptibility gene mutations and predicts future risk of develo** hereditary cancers associated with those genes. Additional functionalities adjust for prophylactic interventions, known genetic testing results, and risk modifiers such as race and ancestry. The package comes with a customizable database with default parameter values estimated from published studies.
The PanelPRO package is open-source and provides a fast and flexible back-end for multi-gene, multi-cancer risk modeling with pedigree data. The software enables the identification of high-risk individuals, which will have an impact on personalized prevention strategies for cancer and individualized decision making about genetic testing.
△ Less
Submitted 24 October, 2020;
originally announced October 2020.
-
ResGCN: Attention-based Deep Residual Modeling for Anomaly Detection on Attributed Networks
Authors:
Yulong Pei,
Tian** Huang,
Werner van Ipenburg,
Mykola Pechenizkiy
Abstract:
Effectively detecting anomalous nodes in attributed networks is crucial for the success of many real-world applications such as fraud and intrusion detection. Existing approaches have difficulties with three major issues: sparsity and nonlinearity capturing, residual modeling, and network smoothing. We propose Residual Graph Convolutional Network (ResGCN), an attention-based deep residual modeling…
▽ More
Effectively detecting anomalous nodes in attributed networks is crucial for the success of many real-world applications such as fraud and intrusion detection. Existing approaches have difficulties with three major issues: sparsity and nonlinearity capturing, residual modeling, and network smoothing. We propose Residual Graph Convolutional Network (ResGCN), an attention-based deep residual modeling approach that can tackle these issues: modeling the attributed networks with GCN allows to capture the sparsity and nonlinearity; utilizing a deep neural network allows to directly learn residual from the input, and a residual-based attention mechanism reduces the adverse effect from anomalous nodes and prevents over-smoothing. Extensive experiments on several real-world attributed networks demonstrate the effectiveness of ResGCN in detecting anomalies.
△ Less
Submitted 30 September, 2020;
originally announced September 2020.
-
How Useful Are the Machine-Generated Interpretations to General Users? A Human Evaluation on Guessing the Incorrectly Predicted Labels
Authors:
Hua Shen,
Ting-Hao Kenneth Huang
Abstract:
Explaining to users why automated systems make certain mistakes is important and challenging. Researchers have proposed ways to automatically produce interpretations for deep neural network models. However, it is unclear how useful these interpretations are in hel** users figure out why they are getting an error. If an interpretation effectively explains to users how the underlying deep neural n…
▽ More
Explaining to users why automated systems make certain mistakes is important and challenging. Researchers have proposed ways to automatically produce interpretations for deep neural network models. However, it is unclear how useful these interpretations are in hel** users figure out why they are getting an error. If an interpretation effectively explains to users how the underlying deep neural network model works, people who were presented with the interpretation should be better at predicting the model's outputs than those who were not. This paper presents an investigation on whether or not showing machine-generated visual interpretations helps users understand the incorrectly predicted labels produced by image classifiers. We showed the images and the correct labels to 150 online crowd workers and asked them to select the incorrectly predicted labels with or without showing them the machine-generated visual interpretations. The results demonstrated that displaying the visual interpretations did not increase, but rather decreased, the average guessing accuracy by roughly 10%.
△ Less
Submitted 27 August, 2020; v1 submitted 26 August, 2020;
originally announced August 2020.
-
Population stratification enables modeling effects of reopening policies on mortality and hospitalization rates
Authors:
Tongtong Huang,
Yan Chu,
Shayan Shams,
Ye** Kim,
Genevera Allen,
Ananth V Annapragada,
Devika Subramanian,
Ioannis Kakadiaris,
Assaf Gottlieb,
Xiaoqian Jiang
Abstract:
Objective: We study the influence of local reopening policies on the composition of the infectious population and their impact on future hospitalization and mortality rates. Materials and Methods: We collected datasets of daily reported hospitalization and cumulative morality of COVID 19 in Houston, Texas, from May 1, 2020 until June 29, 2020. These datasets are from multiple sources (USA FACTS, S…
▽ More
Objective: We study the influence of local reopening policies on the composition of the infectious population and their impact on future hospitalization and mortality rates. Materials and Methods: We collected datasets of daily reported hospitalization and cumulative morality of COVID 19 in Houston, Texas, from May 1, 2020 until June 29, 2020. These datasets are from multiple sources (USA FACTS, Southeast Texas Regional Advisory Council COVID 19 report, TMC daily news, and New York Times county level mortality reporting). Our model, risk stratified SIR HCD uses separate variables to model the dynamics of local contact (e.g., work from home) and high contact (e.g., work on site) subpopulations while sharing parameters to control their respective $R_0(t)$ over time. Results: We evaluated our models forecasting performance in Harris County, TX (the most populated county in the Greater Houston area) during the Phase I and Phase II reopening. Not only did our model outperform other competing models, it also supports counterfactual analysis to simulate the impact of future policies in a local setting, which is unique among existing approaches. Discussion: Local mortality and hospitalization are significantly impacted by quarantine and reopening policies. No existing model has directly accounted for the effect of these policies on local trends in infections, hospitalizations, and deaths in an explicit and explainable manner. Our work is an attempt to close this important technical gap to support decision making. Conclusion: Despite several limitations, we think it is a timely effort to rethink about how to best model the dynamics of pandemics under the influence of reopening policies.
△ Less
Submitted 10 August, 2020;
originally announced August 2020.
-
Combining Breast Cancer Risk Prediction Models
Authors:
Zoe Guan,
Theodore Huang,
Anne Marie McCarthy,
Kevin S. Hughes,
Alan Semine,
Hajime Uno,
Lorenzo Trippa,
Giovanni Parmigiani,
Danielle Braun
Abstract:
Accurate risk stratification is key to reducing cancer morbidity through targeted screening and preventative interventions. Numerous breast cancer risk prediction models have been developed, but they often give predictions with conflicting clinical implications. Integrating information from different models may improve the accuracy of risk predictions, which would be valuable for both clinicians a…
▽ More
Accurate risk stratification is key to reducing cancer morbidity through targeted screening and preventative interventions. Numerous breast cancer risk prediction models have been developed, but they often give predictions with conflicting clinical implications. Integrating information from different models may improve the accuracy of risk predictions, which would be valuable for both clinicians and patients. BRCAPRO and BCRAT are two widely used models based on largely complementary sets of risk factors. BRCAPRO is a Bayesian model that uses detailed family history information to estimate the probability of carrying a BRCA1/2 mutation, as well as future risk of breast and ovarian cancer, based on mutation prevalence and penetrance (age-specific probability of develo** cancer given genotype). BCRAT uses a relative hazard model based on first-degree family history and non-genetic risk factors. We consider two approaches for combining BRCAPRO and BCRAT: 1) modifying the penetrance functions in BRCAPRO using relative hazard estimates from BCRAT, and 2) training an ensemble model that takes as input BRCAPRO and BCRAT predictions. We show that the combination models achieve performance gains over BRCAPRO and BCRAT in simulations and data from the Cancer Genetics Network.
△ Less
Submitted 31 July, 2020;
originally announced August 2020.
-
Unified statistical inference for a novel nonlinear dynamic functional/longitudinal data model
Authors:
Lixia Hu,
Tao Huang,
**hong You
Abstract:
In light of recent work studying massive functional/longitudinal data, such as the resulting data from the COVID-19 pandemic, we propose a novel functional/longitudinal data model which is a combination of the popular varying coefficient (VC) model and additive model. We call it Semi-VCAM in which the response could be a functional/longitudinal variable, and the explanatory variables could be a mi…
▽ More
In light of recent work studying massive functional/longitudinal data, such as the resulting data from the COVID-19 pandemic, we propose a novel functional/longitudinal data model which is a combination of the popular varying coefficient (VC) model and additive model. We call it Semi-VCAM in which the response could be a functional/longitudinal variable, and the explanatory variables could be a mixture of functional/longitudinal and scalar variables. Notably some of the scalar variables could be categorical variables as well. The Semi-VCAM simultaneously allows for both substantial flexibility and the maintaining of one-dimensional rates of convergence. A local linear smoothing with the aid of an initial B spline series approximation is developed to estimate the unknown functional effects in the model. To avoid the subjective choice between the sparse and dense cases of the data, we establish the asymptotic theories of the resultant Pilot Estimation Based Local Linear Estimators (PEBLLE) on a unified framework of sparse, dense and ultra-dense cases of the data. Moreover, we construct unified consistent tests to justify whether a parsimony submodel is sufficient or not. These test methods also avoid the subjective choice between the sparse, dense and ultra dense cases of the data. Extensive Monte Carlo simulation studies investigating the finite sample performance of the proposed methodologies confirm our asymptotic results. We further illustrate our methodologies via analyzing the COVID-19 data from China and the CD4 data.
△ Less
Submitted 3 July, 2020;
originally announced July 2020.
-
Non-convex Min-Max Optimization: Applications, Challenges, and Recent Theoretical Advances
Authors:
Meisam Razaviyayn,
Tianjian Huang,
Songtao Lu,
Maher Nouiehed,
Maziar Sanjabi,
Mingyi Hong
Abstract:
The min-max optimization problem, also known as the saddle point problem, is a classical optimization problem which is also studied in the context of zero-sum games. Given a class of objective functions, the goal is to find a value for the argument which leads to a small objective value even for the worst case function in the given class. Min-max optimization problems have recently become very pop…
▽ More
The min-max optimization problem, also known as the saddle point problem, is a classical optimization problem which is also studied in the context of zero-sum games. Given a class of objective functions, the goal is to find a value for the argument which leads to a small objective value even for the worst case function in the given class. Min-max optimization problems have recently become very popular in a wide range of signal and data processing applications such as fair beamforming, training generative adversarial networks (GANs), and robust machine learning, to just name a few. The overarching goal of this article is to provide a survey of recent advances for an important subclass of min-max problem, where the minimization and maximization problems can be non-convex and/or non-concave. In particular, we will first present a number of applications to showcase the importance of such min-max problems; then we discuss key theoretical challenges, and provide a selective review of some exciting recent theoretical and algorithmic advances in tackling non-convex min-max problems. Finally, we will point out open questions and future research directions.
△ Less
Submitted 18 August, 2020; v1 submitted 15 June, 2020;
originally announced June 2020.
-
Learning Individually Inferred Communication for Multi-Agent Cooperation
Authors:
Ziluo Ding,
Tiejun Huang,
Zongqing Lu
Abstract:
Communication lays the foundation for human cooperation. It is also crucial for multi-agent cooperation. However, existing work focuses on broadcast communication, which is not only impractical but also leads to information redundancy that could even impair the learning process. To tackle these difficulties, we propose Individually Inferred Communication (I2C), a simple yet effective model to enab…
▽ More
Communication lays the foundation for human cooperation. It is also crucial for multi-agent cooperation. However, existing work focuses on broadcast communication, which is not only impractical but also leads to information redundancy that could even impair the learning process. To tackle these difficulties, we propose Individually Inferred Communication (I2C), a simple yet effective model to enable agents to learn a prior for agent-agent communication. The prior knowledge is learned via causal inference and realized by a feed-forward neural network that maps the agent's local observation to a belief about who to communicate with. The influence of one agent on another is inferred via the joint action-value function in multi-agent reinforcement learning and quantified to label the necessity of agent-agent communication. Furthermore, the agent policy is regularized to better exploit communicated messages. Empirically, we show that I2C can not only reduce communication overhead but also improve the performance in a variety of multi-agent cooperative scenarios, comparing to existing methods. The code is available at https://github.com/PKU-AI-Edge/I2C.
△ Less
Submitted 28 April, 2021; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining
Authors:
Yiqun Mei,
Yuchen Fan,
Yuqian Zhou,
Lichao Huang,
Thomas S. Huang,
Humphrey Shi
Abstract:
Deep convolution-based single image super-resolution (SISR) networks embrace the benefits of learning from large-scale external image resources for local recovery, yet most existing works have ignored the long-range feature-wise similarities in natural images. Some recent works have successfully leveraged this intrinsic feature correlation by exploring non-local attention modules. However, none of…
▽ More
Deep convolution-based single image super-resolution (SISR) networks embrace the benefits of learning from large-scale external image resources for local recovery, yet most existing works have ignored the long-range feature-wise similarities in natural images. Some recent works have successfully leveraged this intrinsic feature correlation by exploring non-local attention modules. However, none of the current deep models have studied another inherent property of images: cross-scale feature correlation. In this paper, we propose the first Cross-Scale Non-Local (CS-NL) attention module with integration into a recurrent neural network. By combining the new CS-NL prior with local and in-scale non-local priors in a powerful recurrent fusion cell, we can find more cross-scale feature correlations within a single low-resolution (LR) image. The performance of SISR is significantly improved by exhaustively integrating all possible priors. Extensive experiments demonstrate the effectiveness of the proposed CS-NL module by setting new state-of-the-arts on multiple SISR benchmarks.
△ Less
Submitted 2 June, 2020;
originally announced June 2020.
-
Pyramid Attention Networks for Image Restoration
Authors:
Yiqun Mei,
Yuchen Fan,
Yulun Zhang,
Jiahui Yu,
Yuqian Zhou,
Ding Liu,
Yun Fu,
Thomas S. Huang,
Humphrey Shi
Abstract:
Self-similarity refers to the image prior widely used in image restoration algorithms that small but similar patterns tend to occur at different locations and scales. However, recent advanced deep convolutional neural network based methods for image restoration do not take full advantage of self-similarities by relying on self-attention neural modules that only process information at the same scal…
▽ More
Self-similarity refers to the image prior widely used in image restoration algorithms that small but similar patterns tend to occur at different locations and scales. However, recent advanced deep convolutional neural network based methods for image restoration do not take full advantage of self-similarities by relying on self-attention neural modules that only process information at the same scale. To solve this problem, we present a novel Pyramid Attention module for image restoration, which captures long-range feature correspondences from a multi-scale feature pyramid. Inspired by the fact that corruptions, such as noise or compression artifacts, drop drastically at coarser image scales, our attention module is designed to be able to borrow clean signals from their "clean" correspondences at the coarser levels. The proposed pyramid attention module is a generic building block that can be flexibly integrated into various neural architectures. Its effectiveness is validated through extensive experiments on multiple image restoration tasks: image denoising, demosaicing, compression artifact reduction, and super resolution. Without any bells and whistles, our PANet (pyramid attention module with simple network backbones) can produce state-of-the-art results with superior accuracy and visual quality. Our code will be available at https://github.com/SHI-Labs/Pyramid-Attention-Networks
△ Less
Submitted 3 June, 2020; v1 submitted 28 April, 2020;
originally announced April 2020.
-
Long-term Prediction of Vehicle Behavior using Short-term Uncertainty-aware Trajectories and High-definition Maps
Authors:
Sai Yalamanchi,
Tzu-Kuo Huang,
Galen Clark Haynes,
Nemanja Djuric
Abstract:
Motion prediction of surrounding vehicles is one of the most important tasks handled by a self-driving vehicle, and represents a critical step in the autonomous system necessary to ensure safety for all the involved traffic actors. Recently a number of researchers from both academic and industrial communities have focused on this important problem, proposing ideas ranging from engineered, rule-bas…
▽ More
Motion prediction of surrounding vehicles is one of the most important tasks handled by a self-driving vehicle, and represents a critical step in the autonomous system necessary to ensure safety for all the involved traffic actors. Recently a number of researchers from both academic and industrial communities have focused on this important problem, proposing ideas ranging from engineered, rule-based methods to learned approaches, shown to perform well at different prediction horizons. In particular, while for longer-term trajectories the engineered methods outperform the competing approaches, the learned methods have proven to be the best choice at short-term horizons. In this work we describe how to overcome the discrepancy between these two research directions, and propose a method that combines the disparate approaches under a single unifying framework. The resulting algorithm fuses learned, uncertainty-aware trajectories with lane-based paths in a principled manner, resulting in improved prediction accuracy at both shorter- and longer-term horizons. Experiments on real-world, large-scale data strongly suggest benefits of the proposed unified method, which outperformed the existing state-of-the-art. Moreover, following offline evaluation the proposed method was successfully tested onboard a self-driving vehicle.
△ Less
Submitted 12 June, 2020; v1 submitted 13 March, 2020;
originally announced March 2020.
-
Kernel Quantization for Efficient Network Compression
Authors:
Zhongzhi Yu,
Yemin Shi,
Tiejun Huang,
Yizhou Yu
Abstract:
This paper presents a novel network compression framework Kernel Quantization (KQ), targeting to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version without significant performance loss. Unlike existing methods struggling with weight bit-length, KQ has the potential in improving the compression ratio by considering the convolutio…
▽ More
This paper presents a novel network compression framework Kernel Quantization (KQ), targeting to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version without significant performance loss. Unlike existing methods struggling with weight bit-length, KQ has the potential in improving the compression ratio by considering the convolution kernel as the quantization unit. Inspired by the evolution from weight pruning to filter pruning, we propose to quantize in both kernel and weight level. Instead of representing each weight parameter with a low-bit index, we learn a kernel codebook and replace all kernels in the convolution layer with corresponding low-bit indexes. Thus, KQ can represent the weight tensor in the convolution layer with low-bit indexes and a kernel codebook with limited size, which enables KQ to achieve significant compression ratio. Then, we conduct a 6-bit parameter quantization on the kernel codebook to further reduce redundancy. Extensive experiments on the ImageNet classification task prove that KQ needs 1.05 and 1.62 bits on average in VGG and ResNet18, respectively, to represent each parameter in the convolution layer and achieves the state-of-the-art compression ratio with little accuracy loss.
△ Less
Submitted 11 March, 2020;
originally announced March 2020.
-
Robust Data Preprocessing for Machine-Learning-Based Disk Failure Prediction in Cloud Production Environments
Authors:
Shujie Han,
Jun Wu,
Erci Xu,
Cheng He,
Patrick P. C. Lee,
Yi Qiang,
Qixing Zheng,
Tao Huang,
Zixi Huang,
Rui Li
Abstract:
To provide proactive fault tolerance for modern cloud data centers, extensive studies have proposed machine learning (ML) approaches to predict imminent disk failures for early remedy and evaluated their approaches directly on public datasets (e.g., Backblaze SMART logs). However, in real-world production environments, the data quality is imperfect (e.g., inaccurate labeling, missing data samples,…
▽ More
To provide proactive fault tolerance for modern cloud data centers, extensive studies have proposed machine learning (ML) approaches to predict imminent disk failures for early remedy and evaluated their approaches directly on public datasets (e.g., Backblaze SMART logs). However, in real-world production environments, the data quality is imperfect (e.g., inaccurate labeling, missing data samples, and complex failure types), thereby degrading the prediction accuracy. We present RODMAN, a robust data preprocessing pipeline that refines data samples before feeding them into ML models. We start with a large-scale trace-driven study of over three million disks from Alibaba Cloud's data centers, and motivate the practical challenges in ML-based disk failure prediction. We then design RODMAN with three data preprocessing echniques, namely failure-type filtering, spline-based data filling, and automated pre-failure backtracking, that are applicable for general ML models. Evaluation on both the Alibaba and Backblaze datasets shows that RODMAN improves the prediction accuracy compared to without data preprocessing under various settings.
△ Less
Submitted 20 December, 2019;
originally announced December 2019.
-
Any-Precision Deep Neural Networks
Authors:
Haichao Yu,
Haoxiang Li,
Honghui Shi,
Thomas S. Huang,
Gang Hua
Abstract:
We present any-precision deep neural networks (DNNs), which are trained with a new method that allows the learned DNNs to be flexible in numerical precision during inference. The same model in runtime can be flexibly and directly set to different bit-widths, by truncating the least significant bits, to support dynamic speed and accuracy trade-off. When all layers are set to low-bits, we show that…
▽ More
We present any-precision deep neural networks (DNNs), which are trained with a new method that allows the learned DNNs to be flexible in numerical precision during inference. The same model in runtime can be flexibly and directly set to different bit-widths, by truncating the least significant bits, to support dynamic speed and accuracy trade-off. When all layers are set to low-bits, we show that the model achieved accuracy comparable to dedicated models trained at the same precision. This nice property facilitates flexible deployment of deep learning models in real-world applications, where in practice trade-offs between model accuracy and runtime efficiency are often sought. Previous literature presents solutions to train models at each individual fixed efficiency/accuracy trade-off point. But how to produce a model flexible in runtime precision is largely unexplored. When the demand of efficiency/accuracy trade-off varies from time to time or even dynamically changes in runtime, it is infeasible to re-train models accordingly, and the storage budget may forbid kee** multiple models. Our proposed framework achieves this flexibility without performance degradation. More importantly, we demonstrate that this achievement is agnostic to model architectures and applicable to multiple vision tasks. Our code is released at https://github.com/SHI-Labs/Any-Precision-DNNs.
△ Less
Submitted 15 January, 2021; v1 submitted 17 November, 2019;
originally announced November 2019.
-
Adversarial Feature Alignment: Avoid Catastrophic Forgetting in Incremental Task Lifelong Learning
Authors:
Xin Yao,
Tianchi Huang,
Chenglei Wu,
Rui-Xiao Zhang,
Lifeng Sun
Abstract:
Human beings are able to master a variety of knowledge and skills with ongoing learning. By contrast, dramatic performance degradation is observed when new tasks are added to an existing neural network model. This phenomenon, termed as \emph{Catastrophic Forgetting}, is one of the major roadblocks that prevent deep neural networks from achieving human-level artificial intelligence. Several researc…
▽ More
Human beings are able to master a variety of knowledge and skills with ongoing learning. By contrast, dramatic performance degradation is observed when new tasks are added to an existing neural network model. This phenomenon, termed as \emph{Catastrophic Forgetting}, is one of the major roadblocks that prevent deep neural networks from achieving human-level artificial intelligence. Several research efforts, e.g. \emph{Lifelong} or \emph{Continual} learning algorithms, have been proposed to tackle this problem. However, they either suffer from an accumulating drop in performance as the task sequence grows longer, or require to store an excessive amount of model parameters for historical memory, or cannot obtain competitive performance on the new tasks. In this paper, we focus on the incremental multi-task image classification scenario. Inspired by the learning process of human students, where they usually decompose complex tasks into easier goals, we propose an adversarial feature alignment method to avoid catastrophic forgetting. In our design, both the low-level visual features and high-level semantic features serve as soft targets and guide the training process in multiple stages, which provide sufficient supervised information of the old tasks and help to reduce forgetting. Due to the knowledge distillation and regularization phenomenons, the proposed method gains even better performance than finetuning on the new tasks, which makes it stand out from other methods. Extensive experiments in several typical lifelong learning scenarios demonstrate that our method outperforms the state-of-the-art methods in both accuracies on new tasks and performance preservation on old tasks.
△ Less
Submitted 24 October, 2019;
originally announced October 2019.
-
Federated Learning with Unbiased Gradient Aggregation and Controllable Meta Updating
Authors:
Xin Yao,
Tianchi Huang,
Rui-Xiao Zhang,
Ruiyu Li,
Lifeng Sun
Abstract:
Federated learning (FL) aims to train machine learning models in the decentralized system consisting of an enormous amount of smart edge devices. Federated averaging (FedAvg), the fundamental algorithm in FL settings, proposes on-device training and model aggregation to avoid the potential heavy communication costs and privacy concerns brought by transmitting raw data. However, through theoretical…
▽ More
Federated learning (FL) aims to train machine learning models in the decentralized system consisting of an enormous amount of smart edge devices. Federated averaging (FedAvg), the fundamental algorithm in FL settings, proposes on-device training and model aggregation to avoid the potential heavy communication costs and privacy concerns brought by transmitting raw data. However, through theoretical analysis we argue that 1) the multiple steps of local updating will result in gradient biases and 2) there is an inconsistency between the expected target distribution and the optimization objectives following the training paradigm in FedAvg. To tackle these problems, we first propose an unbiased gradient aggregation algorithm with the keep-trace gradient descent and the gradient evaluation strategy. Then we introduce an additional controllable meta updating procedure with a small set of data samples, indicating the expected target distribution, to provide a clear and consistent optimization objective. Both the two improvements are model- and task-agnostic and can be applied individually or together. Experimental results demonstrate that the proposed methods are faster in convergence and achieve higher accuracy with different network architectures in various FL settings.
△ Less
Submitted 16 December, 2020; v1 submitted 17 October, 2019;
originally announced October 2019.
-
Panoptic-DeepLab
Authors:
Bowen Cheng,
Maxwell D. Collins,
Yukun Zhu,
Ting Liu,
Thomas S. Huang,
Hartwig Adam,
Liang-Chieh Chen
Abstract:
We present Panoptic-DeepLab, a bottom-up and single-shot approach for panoptic segmentation. Our Panoptic-DeepLab is conceptually simple and delivers state-of-the-art results. In particular, we adopt the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively. The semantic segmentation branch is the same as the typical design of any semantic segmentation…
▽ More
We present Panoptic-DeepLab, a bottom-up and single-shot approach for panoptic segmentation. Our Panoptic-DeepLab is conceptually simple and delivers state-of-the-art results. In particular, we adopt the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively. The semantic segmentation branch is the same as the typical design of any semantic segmentation model (e.g., DeepLab), while the instance segmentation branch is class-agnostic, involving a simple instance center regression. Our single Panoptic-DeepLab sets the new state-of-art at all three Cityscapes benchmarks, reaching 84.2% mIoU, 39.0% AP, and 65.5% PQ on test set, and advances results on the other challenging Mapillary Vistas.
△ Less
Submitted 23 October, 2019; v1 submitted 10 October, 2019;
originally announced October 2019.
-
Federated Learning with Additional Mechanisms on Clients to Reduce Communication Costs
Authors:
Xin Yao,
Tianchi Huang,
Chenglei Wu,
Rui-Xiao Zhang,
Lifeng Sun
Abstract:
Federated learning (FL) enables on-device training over distributed networks consisting of a massive amount of modern smart devices, such as smartphones and IoT (Internet of Things) devices. However, the leading optimization algorithm in such settings, i.e., federated averaging (FedAvg), suffers from heavy communication costs and the inevitable performance drop, especially when the local data is d…
▽ More
Federated learning (FL) enables on-device training over distributed networks consisting of a massive amount of modern smart devices, such as smartphones and IoT (Internet of Things) devices. However, the leading optimization algorithm in such settings, i.e., federated averaging (FedAvg), suffers from heavy communication costs and the inevitable performance drop, especially when the local data is distributed in a non-IID way. To alleviate this problem, we propose two potential solutions by introducing additional mechanisms to the on-device training.
The first (FedMMD) is adopting a two-stream model with the MMD (Maximum Mean Discrepancy) constraint instead of a single model in vanilla FedAvg to be trained on devices. Experiments show that the proposed method outperforms baselines, especially in non-IID FL settings, with a reduction of more than 20% in required communication rounds.
The second is FL with feature fusion (FedFusion). By aggregating the features from both the local and global models, we achieve higher accuracy at fewer communication costs. Furthermore, the feature fusion modules offer better initialization for newly incoming clients and thus speed up the process of convergence. Experiments in popular FL scenarios show that our FedFusion outperforms baselines in both accuracy and generalization ability while reducing the number of required communication rounds by more than 60%.
△ Less
Submitted 1 September, 2019; v1 submitted 16 August, 2019;
originally announced August 2019.
-
FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction
Authors:
Tongwen Huang,
Zhiqi Zhang,
Junlin Zhang
Abstract:
Advertising and feed ranking are essential to many Internet companies such as Facebook and Sina Weibo. Among many real-world advertising and feed ranking systems, click through rate (CTR) prediction plays a central role. There are many proposed models in this field such as logistic regression, tree based models, factorization machine based models and deep learning based CTR models. However, many c…
▽ More
Advertising and feed ranking are essential to many Internet companies such as Facebook and Sina Weibo. Among many real-world advertising and feed ranking systems, click through rate (CTR) prediction plays a central role. There are many proposed models in this field such as logistic regression, tree based models, factorization machine based models and deep learning based CTR models. However, many current works calculate the feature interactions in a simple way such as Hadamard product and inner product and they care less about the importance of features. In this paper, a new model named FiBiNET as an abbreviation for Feature Importance and Bilinear feature Interaction NETwork is proposed to dynamically learn the feature importance and fine-grained feature interactions. On the one hand, the FiBiNET can dynamically learn the importance of features via the Squeeze-Excitation network (SENET) mechanism; on the other hand, it is able to effectively learn the feature interactions via bilinear function. We conduct extensive experiments on two real-world datasets and show that our shallow model outperforms other shallow models such as factorization machine(FM) and field-aware factorization machine(FFM). In order to improve performance further, we combine a classical deep neural network(DNN) component with the shallow model to be a deep model. The deep FiBiNET consistently outperforms the other state-of-the-art deep models such as DeepFM and extreme deep factorization machine(XdeepFM).
△ Less
Submitted 22 May, 2019;
originally announced May 2019.
-
High Frequency Residual Learning for Multi-Scale Image Classification
Authors:
Bowen Cheng,
Rong Xiao,
Jianfeng Wang,
Thomas Huang,
Lei Zhang
Abstract:
We present a novel high frequency residual learning framework, which leads to a highly efficient multi-scale network (MSNet) architecture for mobile and embedded vision problems. The architecture utilizes two networks: a low resolution network to efficiently approximate low frequency components and a high resolution network to learn high frequency residuals by reusing the upsampled low resolution…
▽ More
We present a novel high frequency residual learning framework, which leads to a highly efficient multi-scale network (MSNet) architecture for mobile and embedded vision problems. The architecture utilizes two networks: a low resolution network to efficiently approximate low frequency components and a high resolution network to learn high frequency residuals by reusing the upsampled low resolution features. With a classifier calibration module, MSNet can dynamically allocate computation resources during inference to achieve a better speed and accuracy trade-off. We evaluate our methods on the challenging ImageNet-1k dataset and observe consistent improvements over different base networks. On ResNet-18 and MobileNet with alpha=1.0, MSNet gains 1.5% accuracy over both architectures without increasing computations. On the more efficient MobileNet with alpha=0.25, our method gains 3.8% accuracy with the same amount of computations.
△ Less
Submitted 7 May, 2019;
originally announced May 2019.
-
Reconstruction of Natural Visual Scenes from Neural Spikes with Deep Neural Networks
Authors:
Yichen Zhang,
Shanshan Jia,
Ya**g Zheng,
Zhaofei Yu,
Yonghong Tian,
Siwei Ma,
Tiejun Huang,
Jian K. Liu
Abstract:
Neural coding is one of the central questions in systems neuroscience for understanding how the brain processes stimulus from the environment, moreover, it is also a cornerstone for designing algorithms of brain-machine interface, where decoding incoming stimulus is highly demanded for better performance of physical devices. Traditionally researchers have focused on functional magnetic resonance i…
▽ More
Neural coding is one of the central questions in systems neuroscience for understanding how the brain processes stimulus from the environment, moreover, it is also a cornerstone for designing algorithms of brain-machine interface, where decoding incoming stimulus is highly demanded for better performance of physical devices. Traditionally researchers have focused on functional magnetic resonance imaging (fMRI) data as the neural signals of interest for decoding visual scenes. However, our visual perception operates in a fast time scale of millisecond in terms of an event termed neural spike. There are few studies of decoding by using spikes. Here we fulfill this aim by develo** a novel decoding framework based on deep neural networks, named spike-image decoder (SID), for reconstructing natural visual scenes, including static images and dynamic videos, from experimentally recorded spikes of a population of retinal ganglion cells. The SID is an end-to-end decoder with one end as neural spikes and the other end as images, which can be trained directly such that visual scenes are reconstructed from spikes in a highly accurate fashion. Our SID also outperforms on the reconstruction of visual stimulus compared to existing fMRI decoding models. In addition, with the aid of a spike encoder, we show that SID can be generalized to arbitrary visual scenes by using the image datasets of MNIST, CIFAR10, and CIFAR100. Furthermore, with a pre-trained SID, one can decode any dynamic videos to achieve real-time encoding and decoding of visual scenes by spikes. Altogether, our results shed new light on neuromorphic computing for artificial visual systems, such as event-based visual cameras and visual neuroprostheses.
△ Less
Submitted 28 January, 2020; v1 submitted 29 April, 2019;
originally announced April 2019.
-
Poisson PCA: Poisson Measurement Error corrected PCA, with Application to Microbiome Data
Authors:
Toby Kenney,
Tianshu Huang,
Hong Gu
Abstract:
In this paper, we study the problem of computing a Principal Component Analysis of data affected by Poisson noise. We assume samples are drawn from independent Poisson distributions. We want to estimate principle components of a fixed transformation of the latent Poisson means. Our motivating example is microbiome data, though the methods apply to many other situations. We develop a semiparametric…
▽ More
In this paper, we study the problem of computing a Principal Component Analysis of data affected by Poisson noise. We assume samples are drawn from independent Poisson distributions. We want to estimate principle components of a fixed transformation of the latent Poisson means. Our motivating example is microbiome data, though the methods apply to many other situations. We develop a semiparametric approach to correct the bias of variance estimators, both for untransformed and transformed (with particular attention to log-transformation) Poisson means. Furthermore, we incorporate methods for correcting different exposure or sequencing depth in the data. In addition to identifying the principal components, we also address the non-trivial problem of computing the principal scores in this semiparametric framework. Most previous approaches tend to take a more parametric line. For example the Poisson-log-normal (PLN) model, approach. We compare our method with the PLN approach and find that our method is better at identifying the main principal components of the latent log-transformed Poisson means, and as a further major advantage, takes far less time to compute. Comparing methods on real data, we see that our method also appears to be more robust to outliers than the parametric method.
△ Less
Submitted 26 April, 2019;
originally announced April 2019.
-
Probabilistic Inference of Binary Markov Random Fields in Spiking Neural Networks through Mean-field Approximation
Authors:
Ya**g Zheng,
Shanshan Jia,
Zhaofei Yu,
Tiejun Huang,
Jian K. Liu,
Yonghong Tian
Abstract:
Recent studies have suggested that the cognitive process of the human brain is realized as probabilistic inference and can be further modeled by probabilistic graphical models like Markov random fields. Nevertheless, it remains unclear how probabilistic inference can be implemented by a network of spiking neurons in the brain. Previous studies have tried to relate the inference equation of binary…
▽ More
Recent studies have suggested that the cognitive process of the human brain is realized as probabilistic inference and can be further modeled by probabilistic graphical models like Markov random fields. Nevertheless, it remains unclear how probabilistic inference can be implemented by a network of spiking neurons in the brain. Previous studies have tried to relate the inference equation of binary Markov random fields to the dynamic equation of spiking neural networks through belief propagation algorithm and reparameterization, but they are valid only for Markov random fields with limited network structure. In this paper, we propose a spiking neural network model that can implement inference of arbitrary binary Markov random fields. Specifically, we design a spiking recurrent neural network and prove that its neuronal dynamics are mathematically equivalent to the inference process of Markov random fields by adopting mean-field theory. Furthermore, our mean-field approach unifies previous works. Theoretical analysis and experimental results, together with the application to image denoising, demonstrate that our proposed spiking neural network can get comparable results to that of mean-field inference.
△ Less
Submitted 12 March, 2020; v1 submitted 22 February, 2019;
originally announced February 2019.
-
Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods
Authors:
Maher Nouiehed,
Maziar Sanjabi,
Tianjian Huang,
Jason D. Lee,
Meisam Razaviyayn
Abstract:
Recent applications that arise in machine learning have surged significant interest in solving min-max saddle point games. This problem has been extensively studied in the convex-concave regime for which a global equilibrium solution can be computed efficiently. In this paper, we study the problem in the non-convex regime and show that an \varepsilon--first order stationary point of the game can b…
▽ More
Recent applications that arise in machine learning have surged significant interest in solving min-max saddle point games. This problem has been extensively studied in the convex-concave regime for which a global equilibrium solution can be computed efficiently. In this paper, we study the problem in the non-convex regime and show that an \varepsilon--first order stationary point of the game can be computed when one of the player's objective can be optimized to global optimality efficiently. In particular, we first consider the case where the objective of one of the players satisfies the Polyak-Łojasiewicz (PL) condition. For such a game, we show that a simple multi-step gradient descent-ascent algorithm finds an \varepsilon--first order stationary point of the problem in \widetilde{\mathcal{O}}(\varepsilon^{-2}) iterations. Then we show that our framework can also be applied to the case where the objective of the "max-player" is concave. In this case, we propose a multi-step gradient descent-ascent algorithm that finds an \varepsilon--first order stationary point of the game in \widetilde{\cal O}(\varepsilon^{-3.5}) iterations, which is the best known rate in the literature. We applied our algorithm to a fair classification problem of Fashion-MNIST dataset and observed that the proposed algorithm results in smoother training and better generalization.
△ Less
Submitted 30 October, 2019; v1 submitted 21 February, 2019;
originally announced February 2019.
-
FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary
Authors:
Yingzhen Yang,
Jiahui Yu,
Nebojsa Jojic,
Jun Huan,
Thomas S. Huang
Abstract:
We present a novel method of compression of deep Convolutional Neural Networks (CNNs) by weight sharing through a new representation of convolutional filters. The proposed method reduces the number of parameters of each convolutional layer by learning a 1D vector termed Filter Summary (FS). The convolutional filters are located in FS as overlap** 1D segments, and nearby filters in FS share weigh…
▽ More
We present a novel method of compression of deep Convolutional Neural Networks (CNNs) by weight sharing through a new representation of convolutional filters. The proposed method reduces the number of parameters of each convolutional layer by learning a 1D vector termed Filter Summary (FS). The convolutional filters are located in FS as overlap** 1D segments, and nearby filters in FS share weights in their overlap** regions in a natural way. The resultant neural network based on such weight sharing scheme, termed Filter Summary CNNs or FSNet, has a FS in each convolution layer instead of a set of independent filters in the conventional convolution layer. FSNet has the same architecture as that of the baseline CNN to be compressed, and each convolution layer of FSNet has the same number of filters from FS as that of the basline CNN in the forward process. With compelling computational acceleration ratio, the parameter space of FSNet is much smaller than that of the baseline CNN. In addition, FSNet is quantization friendly. FSNet with weight quantization leads to even higher compression ratio without noticeable performance loss. We further propose Differentiable FSNet where the way filters share weights is learned in a differentiable and end-to-end manner. Experiments demonstrate the effectiveness of FSNet in compression of CNNs for computer vision tasks including image classification and object detection, and the effectiveness of DFSNet is evidenced by the task of Neural Architecture Search.
△ Less
Submitted 10 April, 2020; v1 submitted 8 February, 2019;
originally announced February 2019.
-
An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity
Authors:
Yingzhen Yang,
Jiahui Yu,
Xingjian Li,
Jun Huan,
Thomas S. Huang
Abstract:
Regularization of Deep Neural Networks (DNNs) for the sake of improving their generalization capability is important and challenging. The development in this line benefits theoretical foundation of DNNs and promotes their usability in different areas of artificial intelligence. In this paper, we investigate the role of Rademacher complexity in improving generalization of DNNs and propose a novel r…
▽ More
Regularization of Deep Neural Networks (DNNs) for the sake of improving their generalization capability is important and challenging. The development in this line benefits theoretical foundation of DNNs and promotes their usability in different areas of artificial intelligence. In this paper, we investigate the role of Rademacher complexity in improving generalization of DNNs and propose a novel regularizer rooted in Local Rademacher Complexity (LRC). While Rademacher complexity is well known as a distribution-free complexity measure of function class that help boost generalization of statistical learning methods, extensive study shows that LRC, its counterpart focusing on a restricted function class, leads to sharper convergence rates and potential better generalization given finite training sample. Our LRC based regularizer is developed by estimating the complexity of the function class centered at the minimizer of the empirical loss of DNNs. Experiments on various types of network architecture demonstrate the effectiveness of LRC regularization in improving generalization. Moreover, our method features the state-of-the-art result on the CIFAR-$10$ dataset with network architecture found by neural architecture search.
△ Less
Submitted 16 November, 2019; v1 submitted 3 February, 2019;
originally announced February 2019.
-
Self-similarity Grou**: A Simple Unsupervised Cross Domain Adaptation Approach for Person Re-identification
Authors:
Yang Fu,
Yunchao Wei,
Guanshuo Wang,
Yuqian Zhou,
Honghui Shi,
Thomas Huang
Abstract:
Domain adaptation in person re-identification (re-ID) has always been a challenging task. In this work, we explore how to harness the natural similar characteristics existing in the samples from the target domain for learning to conduct person re-ID in an unsupervised manner. Concretely, we propose a Self-similarity Grou** (SSG) approach, which exploits the potential similarity (from global body…
▽ More
Domain adaptation in person re-identification (re-ID) has always been a challenging task. In this work, we explore how to harness the natural similar characteristics existing in the samples from the target domain for learning to conduct person re-ID in an unsupervised manner. Concretely, we propose a Self-similarity Grou** (SSG) approach, which exploits the potential similarity (from global body to local parts) of unlabeled samples to automatically build multiple clusters from different views. These independent clusters are then assigned with labels, which serve as the pseudo identities to supervise the training process. We repeatedly and alternatively conduct such a grou** and training process until the model is stable. Despite the apparent simplify, our SSG outperforms the state-of-the-arts by more than 4.6% (DukeMTMC to Market1501) and 4.4% (Market1501 to DukeMTMC) in mAP, respectively. Upon our SSG, we further introduce a clustering-guided semisupervised approach named SSG ++ to conduct the one-shot domain adaption in an open set setting (i.e. the number of independent identities from the target domain is unknown). Without spending much effort on labeling, our SSG ++ can further promote the mAP upon SSG by 10.7% and 6.9%, respectively. Our Code is available at: https://github.com/OasisYang/SSG .
△ Less
Submitted 23 September, 2019; v1 submitted 25 November, 2018;
originally announced November 2018.
-
A Flexible Spatial Autoregressive Modelling Framework for Mixed Covariates of Multiple Data Types
Authors:
Huiwen Wang,
Tingting Huang,
Shanshan Wang
Abstract:
Mixed spatial autoregressive (SAR) models with numerical covariates have been well studied. However, as non-numerical data, such as functional data and compositional data, receive substantial amounts of attention and are applied to economics, medicine and meteorology, it becomes necessary to develop flexible SAR models with multiple data types. In this article, we integrate three types of covariat…
▽ More
Mixed spatial autoregressive (SAR) models with numerical covariates have been well studied. However, as non-numerical data, such as functional data and compositional data, receive substantial amounts of attention and are applied to economics, medicine and meteorology, it becomes necessary to develop flexible SAR models with multiple data types. In this article, we integrate three types of covariates, functional, compositional and numerical, in an SAR model. The new model has the merits of classical functional linear models and compositional linear models with scalar responses. Moreover, we develop an estimation method for the proposed model, which is based on functional principal component analysis (FPCA), the isometric logratio (ilr) transformation and the maximum likelihood estimation method. Monte Carlo experiments demonstrate the effectiveness of the estimators. A real dataset is also used to illustrate the utility of the proposed model.
△ Less
Submitted 7 November, 2018;
originally announced November 2018.
-
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge
Authors:
Spyridon Bakas,
Mauricio Reyes,
Andras Jakab,
Stefan Bauer,
Markus Rempfler,
Alessandro Crimi,
Russell Takeshi Shinohara,
Christoph Berger,
Sung Min Ha,
Martin Rozycki,
Marcel Prastawa,
Esther Alberts,
Jana Lipkova,
John Freymann,
Justin Kirby,
Michel Bilello,
Hassan Fathallah-Shaykh,
Roland Wiest,
Jan Kirschke,
Benedikt Wiestler,
Rivka Colen,
Aikaterini Kotrotsou,
Pamela Lamontagne,
Daniel Marcus,
Mikhail Milchenko
, et al. (402 additional authors not shown)
Abstract:
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles dissem…
▽ More
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.
△ Less
Submitted 23 April, 2019; v1 submitted 5 November, 2018;
originally announced November 2018.
-
Spatial Functional Linear Model and its Estimation Method
Authors:
Tingting Huang,
Gilbert Saporta,
Huiwen Wang,
Shanshan Wang
Abstract:
The classical functional linear regression model (FLM) and its extensions, which are based on the assumption that all individuals are mutually independent, have been well studied and are used by many researchers. This independence assumption is sometimes violated in practice, especially when data with a network structure are collected in scientific disciplines including marketing, sociology and sp…
▽ More
The classical functional linear regression model (FLM) and its extensions, which are based on the assumption that all individuals are mutually independent, have been well studied and are used by many researchers. This independence assumption is sometimes violated in practice, especially when data with a network structure are collected in scientific disciplines including marketing, sociology and spatial economics. However, relatively few studies have examined the applications of FLM to data with network structures. We propose a novel spatial functional linear model (SFLM), that incorporates a spatial autoregressive parameter and a spatial weight matrix into FLM to accommodate spatial dependencies among individuals. The proposed model is relatively flexible as it takes advantage of FLM in handling high-dimensional covariates and spatial autoregressive (SAR) model in capturing network dependencies. We develop an estimation method based on functional principal component analysis (FPCA) and maximum likelihood estimation. Simulation studies show that our method performs as well as the FPCA-based method used with FLM when no network structure is present, and outperforms the latter when network structure is present. A real weather data is also employed to demonstrate the utility of the SFLM.
△ Less
Submitted 1 November, 2018;
originally announced November 2018.
-
Graph Convolutional Reinforcement Learning
Authors:
Jiechuan Jiang,
Chen Dun,
Tiejun Huang,
Zongqing Lu
Abstract:
Learning to cooperate is crucially important in multi-agent environments. The key is to understand the mutual interplay between agents. However, multi-agent environments are highly dynamic, where agents keep moving and their neighbors change quickly. This makes it hard to learn abstract representations of mutual interplay between agents. To tackle these difficulties, we propose graph convolutional…
▽ More
Learning to cooperate is crucially important in multi-agent environments. The key is to understand the mutual interplay between agents. However, multi-agent environments are highly dynamic, where agents keep moving and their neighbors change quickly. This makes it hard to learn abstract representations of mutual interplay between agents. To tackle these difficulties, we propose graph convolutional reinforcement learning, where graph convolution adapts to the dynamics of the underlying graph of the multi-agent environment, and relation kernels capture the interplay between agents by their relation representations. Latent features produced by convolutional layers from gradually increased receptive fields are exploited to learn cooperation, and cooperation is further improved by temporal relation regularization for consistency. Empirically, we show that our method substantially outperforms existing methods in a variety of cooperative scenarios.
△ Less
Submitted 11 February, 2020; v1 submitted 22 October, 2018;
originally announced October 2018.
-
Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks
Authors:
Henggang Cui,
Vladan Radosavljevic,
Fang-Chieh Chou,
Tsung-Han Lin,
Thi Nguyen,
Tzu-Kuo Huang,
Jeff Schneider,
Nemanja Djuric
Abstract:
Autonomous driving presents one of the largest problems that the robotics and artificial intelligence communities are facing at the moment, both in terms of difficulty and potential societal impact. Self-driving vehicles (SDVs) are expected to prevent road accidents and save millions of lives while improving the livelihood and life quality of many more. However, despite large interest and a number…
▽ More
Autonomous driving presents one of the largest problems that the robotics and artificial intelligence communities are facing at the moment, both in terms of difficulty and potential societal impact. Self-driving vehicles (SDVs) are expected to prevent road accidents and save millions of lives while improving the livelihood and life quality of many more. However, despite large interest and a number of industry players working in the autonomous domain, there still remains more to be done in order to develop a system capable of operating at a level comparable to best human drivers. One reason for this is high uncertainty of traffic behavior and large number of situations that an SDV may encounter on the roads, making it very difficult to create a fully generalizable system. To ensure safe and efficient operations, an autonomous vehicle is required to account for this uncertainty and to anticipate a multitude of possible behaviors of traffic actors in its surrounding. We address this critical problem and present a method to predict multiple possible trajectories of actors while also estimating their probabilities. The method encodes each actor's surrounding context into a raster image, used as input by deep convolutional networks to automatically derive relevant features for the task. Following extensive offline evaluation and comparison to state-of-the-art baselines, the method was successfully tested on SDVs in closed-course tests.
△ Less
Submitted 1 March, 2019; v1 submitted 18 September, 2018;
originally announced September 2018.