Search | arXiv e-print repository

Guaranteed Sampling Flexibility for Low-tubal-rank Tensor Completion

Authors: Bowen Su, Juntao You, HanQin Cai, Longxiu Huang

Abstract: While Bernoulli sampling is extensively studied in tensor completion, t-CUR sampling approximates low-tubal-rank tensors via lateral and horizontal subtensors. However, both methods lack sufficient flexibility for diverse practical applications. To address this, we introduce Tensor Cross-Concentrated Sampling (t-CCS), a novel and straightforward sampling model that advances the matrix cross-concen… ▽ More While Bernoulli sampling is extensively studied in tensor completion, t-CUR sampling approximates low-tubal-rank tensors via lateral and horizontal subtensors. However, both methods lack sufficient flexibility for diverse practical applications. To address this, we introduce Tensor Cross-Concentrated Sampling (t-CCS), a novel and straightforward sampling model that advances the matrix cross-concentrated sampling concept within a tensor framework. t-CCS effectively bridges the gap between Bernoulli and t-CUR sampling, offering additional flexibility that can lead to computational savings in various contexts. A key aspect of our work is the comprehensive theoretical analysis provided. We establish a sufficient condition for the successful recovery of a low-rank tensor from its t-CCS samples. In support of this, we also develop a theoretical framework validating the feasibility of t-CUR via uniform random sampling and conduct a detailed theoretical sampling complexity analysis for tensor completion problems utilizing the general Bernoulli sampling model. Moreover, we introduce an efficient non-convex algorithm, the Iterative t-CUR Tensor Completion (ITCURTC) algorithm, specifically designed to tackle the t-CCS-based tensor completion. We have intensively tested and validated the effectiveness of the t-CCS model and the ITCURTC algorithm across both synthetic and real-world datasets. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.07409 [pdf, other]

Accelerating Ill-conditioned Hankel Matrix Recovery via Structured Newton-like Descent

Authors: HanQin Cai, Longxiu Huang, Xiliang Lu, Juntao You

Abstract: This paper studies the robust Hankel recovery problem, which simultaneously removes the sparse outliers and fulfills missing entries from the partial observation. We propose a novel non-convex algorithm, coined Hankel Structured Newton-Like Descent (HSNLD), to tackle the robust Hankel recovery problem. HSNLD is highly efficient with linear convergence, and its convergence rate is independent of th… ▽ More This paper studies the robust Hankel recovery problem, which simultaneously removes the sparse outliers and fulfills missing entries from the partial observation. We propose a novel non-convex algorithm, coined Hankel Structured Newton-Like Descent (HSNLD), to tackle the robust Hankel recovery problem. HSNLD is highly efficient with linear convergence, and its convergence rate is independent of the condition number of the underlying Hankel matrix. The recovery guarantee has been established under some mild conditions. Numerical experiments on both synthetic and real datasets show the superior performance of HSNLD against state-of-the-art algorithms. △ Less

Submitted 11 June, 2024; originally announced June 2024.

MSC Class: 15A29; 15A83; 47B35; 90C17; 90C26; 90C53

arXiv:2406.05822 [pdf, other]

Symmetric Matrix Completion with ReLU Sampling

Authors: Huikang Liu, Peng Wang, Longxiu Huang, Qing Qu, Laura Balzano

Abstract: We study the problem of symmetric positive semi-definite low-rank matrix completion (MC) with deterministic entry-dependent sampling. In particular, we consider rectified linear unit (ReLU) sampling, where only positive entries are observed, as well as a generalization to threshold-based sampling. We first empirically demonstrate that the landscape of this MC problem is not globally benign: Gradie… ▽ More We study the problem of symmetric positive semi-definite low-rank matrix completion (MC) with deterministic entry-dependent sampling. In particular, we consider rectified linear unit (ReLU) sampling, where only positive entries are observed, as well as a generalization to threshold-based sampling. We first empirically demonstrate that the landscape of this MC problem is not globally benign: Gradient descent (GD) with random initialization will generally converge to stationary points that are not globally optimal. Nevertheless, we prove that when the matrix factor with a small rank satisfies mild assumptions, the nonconvex objective function is geodesically strongly convex on the quotient manifold in a neighborhood of a planted low-rank matrix. Moreover, we show that our assumptions are satisfied by a matrix factor with i.i.d. Gaussian entries. Finally, we develop a tailor-designed initialization for GD to solve our studied formulation, which empirically always achieves convergence to the global minima. We also conduct extensive experiments and compare MC methods, investigating convergence and completion performance with respect to initialization, noise level, dimension, and rank. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: 39 pages, 9 figures; This work has been accepted for publication in the Proceedings of the 41st International Conference on Machine Learning (ICML 2024)

arXiv:2406.00539 [pdf, other]

CONFINE: Conformal Prediction for Interpretable Neural Networks

Authors: Linhui Huang, Sayeri Lala, Niraj K. Jha

Abstract: Deep neural networks exhibit remarkable performance, yet their black-box nature limits their utility in fields like healthcare where interpretability is crucial. Existing explainability approaches often sacrifice accuracy and lack quantifiable measures of prediction uncertainty. In this study, we introduce Conformal Prediction for Interpretable Neural Networks (CONFINE), a versatile framework that… ▽ More Deep neural networks exhibit remarkable performance, yet their black-box nature limits their utility in fields like healthcare where interpretability is crucial. Existing explainability approaches often sacrifice accuracy and lack quantifiable measures of prediction uncertainty. In this study, we introduce Conformal Prediction for Interpretable Neural Networks (CONFINE), a versatile framework that generates prediction sets with statistically robust uncertainty estimates instead of point predictions to enhance model transparency and reliability. CONFINE not only provides example-based explanations and confidence estimates for individual predictions but also boosts accuracy by up to 3.6%. We define a new metric, correct efficiency, to evaluate the fraction of prediction sets that contain precisely the correct label and show that CONFINE achieves correct efficiency of up to 3.3% higher than the original accuracy, matching or exceeding prior methods. CONFINE's marginal and class-conditional coverages attest to its validity across tasks spanning medical image classification to language understanding. Being adaptable to any pre-trained classifier, CONFINE marks a significant advance towards transparent and trustworthy deep learning applications in critical domains. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2403.02625 [pdf, ps, other]

Determining the Number of Common Functional Factors with Twice Cross-Validation

Authors: Hui Jiang, Lei Huang, Shengfan Wu

Abstract: The semiparametric factor model serves as a vital tool to describe the dependence patterns in the data. It recognizes that the common features observed in the data are actually explained by functions of specific exogenous variables.Unlike traditional factor models, where the focus is on selecting the number of factors, our objective here is to identify the appropriate number of common functions, a… ▽ More The semiparametric factor model serves as a vital tool to describe the dependence patterns in the data. It recognizes that the common features observed in the data are actually explained by functions of specific exogenous variables.Unlike traditional factor models, where the focus is on selecting the number of factors, our objective here is to identify the appropriate number of common functions, a crucial parameter in this model. In this paper, we develop a novel data-driven method to determine the number of functional factors using cross validation (CV). Our proposed method employs a two-step CV process that ensures the orthogonality of functional factors, which we refer to as Functional Twice Cross-Validation (FTCV). Extensive simulations demonstrate that FTCV accurately selects the number of common functions and outperforms existing methods in most cases.Furthermore, by specifying market volatility as the exogenous force, we provide real data examples that illustrate the interpretability of selected common functions in characterizing the influence on U.S. Treasury Yields and the cross correlations between Dow30 returns. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.18149 [pdf, ps, other]

Provably Efficient Partially Observable Risk-Sensitive Reinforcement Learning with Hindsight Observation

Authors: Tonghe Zhang, Yu Chen, Longbo Huang

Abstract: This work pioneers regret analysis of risk-sensitive reinforcement learning in partially observable environments with hindsight observation, addressing a gap in theoretical exploration. We introduce a novel formulation that integrates hindsight observations into a Partially Observable Markov Decision Process (POMDP) framework, where the goal is to optimize accumulated reward under the entropic ris… ▽ More This work pioneers regret analysis of risk-sensitive reinforcement learning in partially observable environments with hindsight observation, addressing a gap in theoretical exploration. We introduce a novel formulation that integrates hindsight observations into a Partially Observable Markov Decision Process (POMDP) framework, where the goal is to optimize accumulated reward under the entropic risk measure. We develop the first provably efficient RL algorithm tailored for this setting. We also prove by rigorous analysis that our algorithm achieves polynomial regret $\tilde{O}\left(\frac{e^{|γ|H}-1}{|γ|H}H^2\sqrt{KHS^2OA}\right)$, which outperforms or matches existing upper bounds when the model degenerates to risk-neutral or fully observable settings. We adopt the method of change-of-measure and develop a novel analytical tool of beta vectors to streamline mathematical derivations. These techniques are of particular interest to the theoretical study of reinforcement learning. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 38 pages

arXiv:2402.03447 [pdf, other]

Challenges in Variable Importance Ranking Under Correlation

Authors: Annie Liang, Thomas Jemielita, Andy Liaw, Vladimir Svetnik, Lingkang Huang, Richard Baumgartner, Jason M. Klusowski

Abstract: Variable importance plays a pivotal role in interpretable machine learning as it helps measure the impact of factors on the output of the prediction model. Model agnostic methods based on the generation of "null" features via permutation (or related approaches) can be applied. Such analysis is often utilized in pharmaceutical applications due to its ability to interpret black-box models, including… ▽ More Variable importance plays a pivotal role in interpretable machine learning as it helps measure the impact of factors on the output of the prediction model. Model agnostic methods based on the generation of "null" features via permutation (or related approaches) can be applied. Such analysis is often utilized in pharmaceutical applications due to its ability to interpret black-box models, including tree-based ensembles. A major challenge and significant confounder in variable importance estimation however is the presence of between-feature correlation. Recently, several adjustments to marginal permutation utilizing feature knockoffs were proposed to address this issue, such as the variable importance measure known as conditional predictive impact (CPI). Assessment and evaluation of such approaches is the focus of our work. We first present a comprehensive simulation study investigating the impact of feature correlation on the assessment of variable importance. We then theoretically prove the limitation that highly correlated features pose for the CPI through the knockoff construction. While we expect that there is always no correlation between knockoff variables and its corresponding predictor variables, we prove that the correlation increases linearly beyond a certain correlation threshold between the predictor variables. Our findings emphasize the absence of free lunch when dealing with high feature correlation, as well as the necessity of understanding the utility and limitations behind methods in variable importance estimation. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2401.15566 [pdf, other]

On the Robustness of Cross-Concentrated Sampling for Matrix Completion

Authors: HanQin Cai, Longxiu Huang, Chandra Kundu, Bowen Su

Abstract: Matrix completion is one of the crucial tools in modern data science research. Recently, a novel sampling model for matrix completion coined cross-concentrated sampling (CCS) has caught much attention. However, the robustness of the CCS model against sparse outliers remains unclear in the existing studies. In this paper, we aim to answer this question by exploring a novel Robust CCS Completion pro… ▽ More Matrix completion is one of the crucial tools in modern data science research. Recently, a novel sampling model for matrix completion coined cross-concentrated sampling (CCS) has caught much attention. However, the robustness of the CCS model against sparse outliers remains unclear in the existing studies. In this paper, we aim to answer this question by exploring a novel Robust CCS Completion problem. A highly efficient non-convex iterative algorithm, dubbed Robust CUR Completion (RCURC), is proposed. The empirical performance of the proposed algorithm, in terms of both efficiency and robustness, is verified in synthetic and real datasets. △ Less

Submitted 27 January, 2024; originally announced January 2024.

Comments: 58th Annual Conference of Information Sciences and Systems

arXiv:2310.13969 [pdf, ps, other]

Distributed Linear Regression with Compositional Covariates

Authors: Yue Chao, Lei Huang, Xuejun Ma

Abstract: With the availability of extraordinarily huge data sets, solving the problems of distributed statistical methodology and computing for such data sets has become increasingly crucial in the big data area. In this paper, we focus on the distributed sparse penalized linear log-contrast model in massive compositional data. In particular, two distributed optimization techniques under centralized and de… ▽ More With the availability of extraordinarily huge data sets, solving the problems of distributed statistical methodology and computing for such data sets has become increasingly crucial in the big data area. In this paper, we focus on the distributed sparse penalized linear log-contrast model in massive compositional data. In particular, two distributed optimization techniques under centralized and decentralized topologies are proposed for solving the two different constrained convex optimization problems. Both two proposed algorithms are based on the frameworks of Alternating Direction Method of Multipliers (ADMM) and Coordinate Descent Method of Multipliers(CDMM, Lin et al., 2014, Biometrika). It is worth emphasizing that, in the decentralized topology, we introduce a distributed coordinate-wise descent algorithm based on Group ADMM(GADMM, Elgabli et al., 2020, Journal of Machine Learning Research) for obtaining a communication-efficient regularized estimation. Correspondingly, the convergence theories of the proposed algorithms are rigorously established under some regularity conditions. Numerical experiments on both synthetic and real data are conducted to evaluate our proposed algorithms. △ Less

Submitted 21 October, 2023; originally announced October 2023.

Comments: 35 pages,2 figures

MSC Class: 62-08 62-08 62-08 62-08 62-08 ACM Class: G.3

arXiv:2307.11214 [pdf, other]

FairMobi-Net: A Fairness-aware Deep Learning Model for Urban Mobility Flow Generation

Authors: Zhewei Liu, Lipai Huang, Chao Fan, Ali Mostafavi

Abstract: Generating realistic human flows across regions is essential for our understanding of urban structures and population activity patterns, enabling important applications in the fields of urban planning and management. However, a notable shortcoming of most existing mobility generation methodologies is neglect of prediction fairness, which can result in underestimation of mobility flows across regio… ▽ More Generating realistic human flows across regions is essential for our understanding of urban structures and population activity patterns, enabling important applications in the fields of urban planning and management. However, a notable shortcoming of most existing mobility generation methodologies is neglect of prediction fairness, which can result in underestimation of mobility flows across regions with vulnerable population groups, potentially resulting in inequitable resource distribution and infrastructure development. To overcome this limitation, our study presents a novel, fairness-aware deep learning model, FairMobi-Net, for inter-region human flow prediction. The FairMobi-Net model uniquely incorporates fairness loss into the loss function and employs a hybrid approach, merging binary classification and numerical regression techniques for human flow prediction. We validate the FairMobi-Net model using comprehensive human mobility datasets from four U.S. cities, predicting human flow at the census-tract level. Our findings reveal that the FairMobi-Net model outperforms state-of-the-art models (such as the DeepGravity model) in producing more accurate and equitable human flow predictions across a variety of region pairs, regardless of regional income differences. The model maintains a high degree of accuracy consistently across diverse regions, addressing the previous fairness concern. Further analysis of feature importance elucidates the impact of physical distances and road network structures on human flows across regions. With fairness as its touchstone, the model and results provide researchers and practitioners across the fields of urban sciences, transportation engineering, and computing with an effective tool for accurate generation of human mobility flows across regions. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2303.01131 [pdf]

Association Among Gender, Age, and Region in Taiwan's First Ten Thousand COVID-19 Cases: A Log-linear-model Analysis

Authors: Tai-Cheng Hung, Li-Shan Huang

Abstract: Objectives: We explore the association between age, gender, and region among Taiwan's 11290 local Covid-19 cases from January 22, 2020 to June 11, 2021. Methods: Using open data from Taiwan's CDC, we organize them into a three-dimensional contingency table. The groups are gender, age 0-29, 30-59, and 60+ years old, and two classifications for region: (1) 7 commonly-defined regions, (2) 12 groups s… ▽ More Objectives: We explore the association between age, gender, and region among Taiwan's 11290 local Covid-19 cases from January 22, 2020 to June 11, 2021. Methods: Using open data from Taiwan's CDC, we organize them into a three-dimensional contingency table. The groups are gender, age 0-29, 30-59, and 60+ years old, and two classifications for region: (1) 7 commonly-defined regions, (2) 12 groups separating Taipei, New Taipei, Keelung, Taoyuan, Hsinchu county, Miaoli county, and Hsinchu city. We adopt the log-linear model for statistical analysis and use the BIC for model selection. Results: The model with three pairwise interaction terms has the smallest BIC. In terms of interaction effects, there are more females than males among 30-59 (p<0.001), while more males than females among 60+ (p=0.028). Miaoli County has more male than female cases (p<0.001). Differences between 30-59 and 0-29 (baseline), and between 60+ and 0-29 are significant in Taipei (p=0.002 and p <0.001); similar age effects for New Taipei is observed; Miaoli County has significant difference between 60+ and 0-29 (p<0.001). All Taoyuan's interaction terms are not significant. The main effects of age, the differences between 30-59 and 0-29 (baseline), and between 60+ and 0-29, are both significant (p=0.002 and p=0.046). Conclusions: In the four regions with larger numbers of Covid-19 cases, the age and gender characteristics of the infected population are different, reflecting patterns of infection chains. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: 19 pages, 4 tables, 2 figures

MSC Class: 62P10 (Primary); 62J12 (Secondary)

arXiv:2210.08228 [pdf, other]

Nonparametric Estimation of Mediation Effects with A General Treatment

Authors: Lukang Huang, Wei Huang, Oliver Linton, Zheng Zhang

Abstract: To investigate causal mechanisms, causal mediation analysis decomposes the total treatment effect into the natural direct and indirect effects. This paper examines the estimation of the direct and indirect effects in a general treatment effect model, where the treatment can be binary, multi-valued, continuous, or a mixture. We propose generalized weighting estimators with weights estimated by solv… ▽ More To investigate causal mechanisms, causal mediation analysis decomposes the total treatment effect into the natural direct and indirect effects. This paper examines the estimation of the direct and indirect effects in a general treatment effect model, where the treatment can be binary, multi-valued, continuous, or a mixture. We propose generalized weighting estimators with weights estimated by solving an expanding set of equations. Under some sufficient conditions, we show that the proposed estimators are consistent and asymptotically normal. Specifically, when the treatment is discrete, the proposed estimators attain the semiparametric efficiency bounds. Meanwhile, when the treatment is continuous, the convergence rates of the proposed estimators are slower than $N^{-1/2}$; however, they are still more efficient than that constructed from the true weighting function. A simulation study reveals that our estimators exhibit a satisfactory finite-sample performance, while an application shows their practical value △ Less

Submitted 22 January, 2024; v1 submitted 15 October, 2022; originally announced October 2022.

arXiv:2210.05122 [pdf, other]

doi 10.1103/PhysRevE.107.024128

Universal cover-time distribution of heterogeneous random walks

Authors: Jia-Qi Dong, Wen-Hui Han, Yisen Wang, Xiao-Song Chen, Liang Huang

Abstract: The cover-time problem, i.e., time to visit every site in a system, is one of the key issues of random walks with wide applications in natural, social, and engineered systems. Addressing the full distribution of cover times for random walk on complex structures has been a long-standing challenge and has attracted persistent efforts. Yet, the known results are essentially limited to homogeneous sys… ▽ More The cover-time problem, i.e., time to visit every site in a system, is one of the key issues of random walks with wide applications in natural, social, and engineered systems. Addressing the full distribution of cover times for random walk on complex structures has been a long-standing challenge and has attracted persistent efforts. Yet, the known results are essentially limited to homogeneous systems, where different sites are on an equal footing and have identical or close mean first-passage times, such as random walks on a torus. In contrast, realistic random walks are prevailingly heterogeneous with diversified mean first-passage times. Does a universal distribution still exist? Here, by considering the most general situations, we uncover a generalized rescaling relation for the cover time, exploiting the diversified mean first-passage times that have not been accounted for before. This allows us to concretely establish a universal distribution of the rescaled cover times for heterogeneous random walks, which turns out to be the Gumbel universality class that is ubiquitous for a large family of extreme value statistics. Our analysis is based on the transfer matrix framework, which is generic that besides heterogeneity, it is also robust against biased protocols, directed links, and self-connecting loops. The finding is corroborated with extensive numerical simulations of diverse heterogeneous non-compact random walks on both model and realistic topological structures. Our new technical ingredient may be exploited for other extreme value or ergodicity problems with nonidentical distributions. △ Less

Submitted 10 October, 2022; originally announced October 2022.

Comments: 12 pages, 6 figures

arXiv:2208.04298 [pdf, other]

doi 10.3390/s22145462

Gaze Estimation Approach Using Deep Differential Residual Network

Authors: Longzhao Huang, Yujie Li, Xu Wang, Haoyu Wang, Ahmed Bouridane, Ahmad Chaddad

Abstract: Gaze estimation, which is a method to determine where a person is looking at given the person's full face, is a valuable clue for understanding human intention. Similarly to other domains of computer vision, deep learning (DL) methods have gained recognition in the gaze estimation domain. However, there are still gaze calibration problems in the gaze estimation domain, thus preventing existing met… ▽ More Gaze estimation, which is a method to determine where a person is looking at given the person's full face, is a valuable clue for understanding human intention. Similarly to other domains of computer vision, deep learning (DL) methods have gained recognition in the gaze estimation domain. However, there are still gaze calibration problems in the gaze estimation domain, thus preventing existing methods from further improving the performances. An effective solution is to directly predict the difference information of two human eyes, such as the differential network (Diff-Nn). However, this solution results in a loss of accuracy when using only one inference image. We propose a differential residual model (DRNet) combined with a new loss function to make use of the difference information of two eye images. We treat the difference information as auxiliary information. We assess the proposed model (DRNet) mainly using two public datasets (1) MpiiGaze and (2) Eyediap. Considering only the eye features, DRNet outperforms the state-of-the-art gaze estimation methods with $angular-error$ of 4.57 and 6.14 using MpiiGaze and Eyediap datasets, respectively. Furthermore, the experimental results also demonstrate that DRNet is extremely robust to noise images. △ Less

Submitted 8 August, 2022; originally announced August 2022.

Journal ref: Sensors 2022, 22(14), 5462;

arXiv:2203.14702 [pdf, other]

Bi-level Doubly Variational Learning for Energy-based Latent Variable Models

Authors: Ge Kan, **hu Lü, Tian Wang, Baochang Zhang, Aichun Zhu, Lei Huang, Guodong Guo, Hichem Snoussi

Abstract: Energy-based latent variable models (EBLVMs) are more expressive than conventional energy-based models. However, its potential on visual tasks are limited by its training process based on maximum likelihood estimate that requires sampling from two intractable distributions. In this paper, we propose Bi-level doubly variational learning (BiDVL), which is based on a new bi-level optimization framewo… ▽ More Energy-based latent variable models (EBLVMs) are more expressive than conventional energy-based models. However, its potential on visual tasks are limited by its training process based on maximum likelihood estimate that requires sampling from two intractable distributions. In this paper, we propose Bi-level doubly variational learning (BiDVL), which is based on a new bi-level optimization framework and two tractable variational distributions to facilitate learning EBLVMs. Particularly, we lead a decoupled EBLVM consisting of a marginal energy-based distribution and a structural posterior to handle the difficulties when learning deep EBLVMs on images. By choosing a symmetric KL divergence in the lower level of our framework, a compact BiDVL for visual tasks can be obtained. Our model achieves impressive image generation performance over related works. It also demonstrates the significant capacity of testing image reconstruction and out-of-distribution detection. △ Less

Submitted 24 March, 2022; originally announced March 2022.

Comments: CVPR 2022

arXiv:2201.13324 [pdf, other]

Guided Semi-Supervised Non-negative Matrix Factorization on Legal Documents

Authors: Pengyu Li, Christine Tseng, Yaxuan Zheng, Joyce A. Chew, Longxiu Huang, Benjamin Jarman, Deanna Needell

Abstract: Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we… ▽ More Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we propose a method, namely Guided Semi-Supervised Non-negative Matrix Factorization (GSSNMF), that performs both classification and topic modeling by incorporating supervision from both pre-assigned document class labels and user-designed seed words. We test the performance of this method through its application to legal documents provided by the California Innocence Project, a nonprofit that works to free innocent convicted persons and reform the justice system. The results show that our proposed method improves both classification accuracy and topic coherence in comparison to past methods like Semi-Supervised Non-negative Matrix Factorization (SSNMF) and Guided Non-negative Matrix Factorization (Guided NMF). △ Less

Submitted 31 January, 2022; originally announced January 2022.

Comments: 14 pages, 4 figures

arXiv:2110.15263 [pdf, other]

Coresets for Time Series Clustering

Authors: Lingxiao Huang, K. Sudhir, Nisheeth K. Vishnoi

Abstract: We study the problem of constructing coresets for clustering problems with time series data. This problem has gained importance across many fields including biology, medicine, and economics due to the proliferation of sensors facilitating real-time measurement and rapid drop in storage costs. In particular, we consider the setting where the time series data on $N$ entities is generated from a Gaus… ▽ More We study the problem of constructing coresets for clustering problems with time series data. This problem has gained importance across many fields including biology, medicine, and economics due to the proliferation of sensors facilitating real-time measurement and rapid drop in storage costs. In particular, we consider the setting where the time series data on $N$ entities is generated from a Gaussian mixture model with autocorrelations over $k$ clusters in $\mathbb{R}^d$. Our main contribution is an algorithm to construct coresets for the maximum likelihood objective for this mixture model. Our algorithm is efficient, and under a mild boundedness assumption on the covariance matrices of the underlying Gaussians, the size of the coreset is independent of the number of entities $N$ and the number of observations for each entity, and depends only polynomially on $k$, $d$ and $1/\varepsilon$, where $\varepsilon$ is the error parameter. We empirically assess the performance of our coreset with synthetic data. △ Less

Submitted 28 October, 2021; originally announced October 2021.

Comments: Full version of a paper appearing in NeurIPS 2021

arXiv:2110.14446 [pdf, other]

Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Authors: Derek Lim, Felix Hohne, Xiuyu Li, Sijia Linda Huang, Vaishnavi Gupta, Omkar Bhalerao, Ser-Nam Lim

Abstract: Many widely used datasets for graph machine learning tasks have generally been homophilous, where nodes with similar labels connect to each other. Recently, new Graph Neural Networks (GNNs) have been developed that move beyond the homophily regime; however, their evaluation has often been conducted on small graphs with limited application domains. We collect and introduce diverse non-homophilous d… ▽ More Many widely used datasets for graph machine learning tasks have generally been homophilous, where nodes with similar labels connect to each other. Recently, new Graph Neural Networks (GNNs) have been developed that move beyond the homophily regime; however, their evaluation has often been conducted on small graphs with limited application domains. We collect and introduce diverse non-homophilous datasets from a variety of application areas that have up to 384x more nodes and 1398x more edges than prior datasets. We further show that existing scalable graph learning and graph minibatching techniques lead to performance degradation on these non-homophilous datasets, thus highlighting the need for further work on scalable non-homophilous methods. To address these concerns, we introduce LINKX -- a strong simple method that admits straightforward minibatch training and inference. Extensive experimental results with representative simple methods and GNNs across our proposed datasets show that LINKX achieves state-of-the-art performance for learning on non-homophilous graphs. Our codes and data are available at https://github.com/CUAI/Non-Homophily-Large-Scale. △ Less

Submitted 27 October, 2021; originally announced October 2021.

Comments: Published at NeurIPS 2021

arXiv:2110.13400

Scale-Free Adversarial Multi-Armed Bandit with Arbitrary Feedback Delays

Authors: Jiatai Huang, Yan Dai, Longbo Huang

Abstract: We consider the Scale-Free Adversarial Multi-Armed Bandit (MAB) problem with unrestricted feedback delays. In contrast to the standard assumption that all losses are $[0,1]$-bounded, in our setting, losses can fall in a general bounded interval $[-L, L]$, unknown to the agent beforehand. Furthermore, the feedback of each arm pull can experience arbitrary delays. We propose a novel approach named S… ▽ More We consider the Scale-Free Adversarial Multi-Armed Bandit (MAB) problem with unrestricted feedback delays. In contrast to the standard assumption that all losses are $[0,1]$-bounded, in our setting, losses can fall in a general bounded interval $[-L, L]$, unknown to the agent beforehand. Furthermore, the feedback of each arm pull can experience arbitrary delays. We propose a novel approach named Scale-Free Delayed INF (SFD-INF) for this novel setting, which combines a recent "convex combination trick" together with a novel doubling and skip** technique. We then present two instances of SFD-INF, each with carefully designed delay-adapted learning scales. The first one SFD-TINF uses $\frac 12$-Tsallis entropy regularizer and can achieve $\widetilde{\mathcal O}(\sqrt{K(D+T)}L)$ regret when the losses are non-negative, where $K$ is the number of actions, $T$ is the number of steps, and $D$ is the total feedback delay. This bound nearly matches the $Ω((\sqrt{KT}+\sqrt{D\log K})L)$ lower-bound when regarding $K$ as a constant independent of $T$. The second one, SFD-LBINF, works for general scale-free losses and achieves a small-loss style adaptive regret bound $\widetilde{\mathcal O}(\sqrt{K\mathbb{E}[\tilde{\mathfrak L}_T^2]}+\sqrt{KDL})$, which falls to the $\widetilde{\mathcal O}(\sqrt{K(D+T)}L)$ regret in the worst case and is thus more general than SFD-TINF despite a more complicated analysis and several extra logarithmic dependencies. Moreover, both instances also outperform the existing algorithms for non-delayed (i.e., $D=0$) scale-free adversarial MAB problems, which can be of independent interest. △ Less

Submitted 25 January, 2023; v1 submitted 26 October, 2021; originally announced October 2021.

Comments: Preliminary work, merged to arXiv:2301.10500

arXiv:2110.05636 [pdf, other]

CAPITAL: Optimal Subgroup Identification via Constrained Policy Tree Search

Authors: Hengrui Cai, Wenbin Lu, Rachel Marceau West, Devan V. Mehrotra, Lingkang Huang

Abstract: Personalized medicine, a paradigm of medicine tailored to a patient's characteristics, is an increasingly attractive field in health care. An important goal of personalized medicine is to identify a subgroup of patients, based on baseline covariates, that benefits more from the targeted treatment than other comparative treatments. Most of the current subgroup identification methods only focus on o… ▽ More Personalized medicine, a paradigm of medicine tailored to a patient's characteristics, is an increasingly attractive field in health care. An important goal of personalized medicine is to identify a subgroup of patients, based on baseline covariates, that benefits more from the targeted treatment than other comparative treatments. Most of the current subgroup identification methods only focus on obtaining a subgroup with an enhanced treatment effect without paying attention to subgroup size. Yet, a clinically meaningful subgroup learning approach should identify the maximum number of patients who can benefit from the better treatment. In this paper, we present an optimal subgroup selection rule (SSR) that maximizes the number of selected patients, and in the meantime, achieves the pre-specified clinically meaningful mean outcome, such as the average treatment effect. We derive two equivalent theoretical forms of the optimal SSR based on the contrast function that describes the treatment-covariates interaction in the outcome. We further propose a ConstrAined PolIcy Tree seArch aLgorithm (CAPITAL) to find the optimal SSR within the interpretable decision tree class. The proposed method is flexible to handle multiple constraints that penalize the inclusion of patients with negative treatment effects, and to address time to event data using the restricted mean survival time as the clinically interesting mean outcome. Extensive simulations, comparison studies, and real data applications are conducted to demonstrate the validity and utility of our method. △ Less

Submitted 28 January, 2023; v1 submitted 11 October, 2021; originally announced October 2021.

arXiv:2109.14079 [pdf, other]

Robust recovery of bandlimited graph signals via randomized dynamical sampling

Authors: Longxiu Huang, Deanna Needell, Sui Tang

Abstract: Heat diffusion processes have found wide applications in modelling dynamical systems over graphs. In this paper, we consider the recovery of a $k$-bandlimited graph signal that is an initial signal of a heat diffusion process from its space-time samples. We propose three random space-time sampling regimes, termed dynamical sampling techniques, that consist in selecting a small subset of space-time… ▽ More Heat diffusion processes have found wide applications in modelling dynamical systems over graphs. In this paper, we consider the recovery of a $k$-bandlimited graph signal that is an initial signal of a heat diffusion process from its space-time samples. We propose three random space-time sampling regimes, termed dynamical sampling techniques, that consist in selecting a small subset of space-time nodes at random according to some probability distribution. We show that the number of space-time samples required to ensure stable recovery for each regime depends on a parameter called the spectral graph weighted coherence, that depends on the interplay between the dynamics over the graphs and sampling probability distributions. In optimal scenarios, no more than $\mathcal{O}(k \log(k))$ space-time samples are sufficient to ensure accurate and stable recovery of all $k$-bandlimited signals. In any case, dynamical sampling typically requires much fewer spatial samples than the static case by leveraging the temporal information. Then, we propose a computationally efficient method to reconstruct $k$-bandlimited signals from their space-time samples. We prove that it yields accurate reconstructions and that it is also stable to noise. Finally, we test dynamical sampling techniques on a wide variety of graphs. The numerical results support our theoretical findings and demonstrate the efficiency. △ Less

Submitted 3 October, 2021; v1 submitted 28 September, 2021; originally announced September 2021.

Comments: corrected mistakes in plotting. arXiv admin note: text overlap with arXiv:1511.05118 by other authors

MSC Class: 94A20; 94A12

arXiv:2107.04061 [pdf, other]

Scaling Gaussian Processes with Derivative Information Using Variational Inference

Authors: Misha Padidar, Xinran Zhu, Leo Huang, Jacob R. Gardner, David Bindel

Abstract: Gaussian processes with derivative information are useful in many settings where derivative information is available, including numerous Bayesian optimization and regression tasks that arise in the natural sciences. Incorporating derivative observations, however, comes with a dominating $O(N^3D^3)$ computational cost when training on $N$ points in $D$ input dimensions. This is intractable for even… ▽ More Gaussian processes with derivative information are useful in many settings where derivative information is available, including numerous Bayesian optimization and regression tasks that arise in the natural sciences. Incorporating derivative observations, however, comes with a dominating $O(N^3D^3)$ computational cost when training on $N$ points in $D$ input dimensions. This is intractable for even moderately sized problems. While recent work has addressed this intractability in the low-$D$ setting, the high-$N$, high-$D$ setting is still unexplored and of great value, particularly as machine learning problems increasingly become high dimensional. In this paper, we introduce methods to achieve fully scalable Gaussian process regression with derivatives using variational inference. Analogous to the use of inducing values to sparsify the labels of a training set, we introduce the concept of inducing directional derivatives to sparsify the partial derivative information of a training set. This enables us to construct a variational posterior that incorporates derivative information but whose size depends neither on the full dataset size $N$ nor the full dimensionality $D$. We demonstrate the full scalability of our approach on a variety of tasks, ranging from a high dimensional stellarator fusion regression task to training graph convolutional neural networks on Pubmed using Bayesian optimization. Surprisingly, we find that our approach can improve regression performance even in settings where only label data is available. △ Less

Submitted 8 July, 2021; originally announced July 2021.

arXiv:2106.08943

Banker Online Mirror Descent

Authors: Jiatai Huang, Longbo Huang

Abstract: We propose Banker-OMD, a novel framework generalizing the classical Online Mirror Descent (OMD) technique in online learning algorithm design. Banker-OMD allows algorithms to robustly handle delayed feedback, and offers a general methodology for achieving $\tilde{O}(\sqrt{T} + \sqrt{D})$-style regret bounds in various delayed-feedback online learning tasks, where $T$ is the time horizon length and… ▽ More We propose Banker-OMD, a novel framework generalizing the classical Online Mirror Descent (OMD) technique in online learning algorithm design. Banker-OMD allows algorithms to robustly handle delayed feedback, and offers a general methodology for achieving $\tilde{O}(\sqrt{T} + \sqrt{D})$-style regret bounds in various delayed-feedback online learning tasks, where $T$ is the time horizon length and $D$ is the total feedback delay. We demonstrate the power of Banker-OMD with applications to three important bandit scenarios with delayed feedback, including delayed adversarial Multi-armed bandits (MAB), delayed adversarial linear bandits, and a novel delayed best-of-both-worlds MAB setting. Banker-OMD achieves nearly-optimal performance in all the three settings. In particular, it leads to the first delayed adversarial linear bandit algorithm achieving $\tilde{O}(\text{poly}(n)(\sqrt{T} + \sqrt{D}))$ regret. △ Less

Submitted 25 January, 2023; v1 submitted 16 June, 2021; originally announced June 2021.

Comments: Preliminary work, merged to arXiv:2301.10500

arXiv:2012.07048 [pdf, other]

Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback

Authors: Siwei Wang, Haoyun Wang, Longbo Huang

Abstract: We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. In this model, the reward of pulling an arm spreads over a period of time (we call this period as reward interval) and the player receives partial rewards of the action, convoluted with rewards from pulling other arms, successively. Existing results on this model require prior knowledge about the reward interval s… ▽ More We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. In this model, the reward of pulling an arm spreads over a period of time (we call this period as reward interval) and the player receives partial rewards of the action, convoluted with rewards from pulling other arms, successively. Existing results on this model require prior knowledge about the reward interval size as an input to their algorithms. In this paper, we propose adaptive algorithms for both the stochastic and the adversarial cases, without requiring any prior information about the reward interval. For the stochastic case, we prove that our algorithm guarantees a regret that matches the lower bounds (in order). For the adversarial case, we propose the first algorithm to jointly handle non-oblivious adversary and unknown reward interval size. We also conduct simulations based on real-world dataset. The results show that our algorithms outperform existing benchmarks. △ Less

Submitted 15 December, 2020; v1 submitted 13 December, 2020; originally announced December 2020.

arXiv:2011.00981 [pdf, other]

Coresets for Regressions with Panel Data

Authors: Lingxiao Huang, K. Sudhir, Nisheeth K. Vishnoi

Abstract: This paper introduces the problem of coresets for regression problems to panel data settings. We first define coresets for several variants of regression problems with panel data and then present efficient algorithms to construct coresets of size that depend polynomially on 1/$\varepsilon$ (where $\varepsilon$ is the error parameter) and the number of regression parameters - independent of the num… ▽ More This paper introduces the problem of coresets for regression problems to panel data settings. We first define coresets for several variants of regression problems with panel data and then present efficient algorithms to construct coresets of size that depend polynomially on 1/$\varepsilon$ (where $\varepsilon$ is the error parameter) and the number of regression parameters - independent of the number of individuals in the panel data or the time units each individual is observed for. Our approach is based on the Feldman-Langberg framework in which a key step is to upper bound the "total sensitivity" that is roughly the sum of maximum influences of all individual-time pairs taken over all possible choices of regression parameters. Empirically, we assess our approach with synthetic and real-world datasets; the coreset sizes constructed using our approach are much smaller than the full dataset and coresets indeed accelerate the running time of computing the regression objective. △ Less

Submitted 2 November, 2020; v1 submitted 2 November, 2020; originally announced November 2020.

Comments: This is a Full version of a paper to appear in NeurIPS 2020. The code can be found in https://github.com/huanglx12/Coresets-for-regressions-with-panel-data

arXiv:2010.07422 [pdf, other]

doi 10.1109/LSP.2020.3044130

Rapid Robust Principal Component Analysis: CUR Accelerated Inexact Low Rank Estimation

Authors: HanQin Cai, Keaton Hamm, Longxiu Huang, Jiaqi Li, Tao Wang

Abstract: Robust principal component analysis (RPCA) is a widely used tool for dimension reduction. In this work, we propose a novel non-convex algorithm, coined Iterated Robust CUR (IRCUR), for solving RPCA problems, which dramatically improves the computational efficiency in comparison with the existing algorithms. IRCUR achieves this acceleration by employing CUR decomposition when updating the low rank… ▽ More Robust principal component analysis (RPCA) is a widely used tool for dimension reduction. In this work, we propose a novel non-convex algorithm, coined Iterated Robust CUR (IRCUR), for solving RPCA problems, which dramatically improves the computational efficiency in comparison with the existing algorithms. IRCUR achieves this acceleration by employing CUR decomposition when updating the low rank component, which allows us to obtain an accurate low rank approximation via only three small submatrices. Consequently, IRCUR is able to process only the small submatrices and avoid expensive computing on the full matrix through the entire algorithm. Numerical experiments establish the computational advantage of IRCUR over the state-of-art algorithms on both synthetic and real-world datasets. △ Less

Submitted 7 February, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

Journal ref: IEEE Signal Processing Letters, 28 (2021): 116-120

arXiv:2009.13333 [pdf, other]

Group Whitening: Balancing Learning Efficiency and Representational Capacity

Authors: Lei Huang, Yi Zhou, Li Liu, Fan Zhu, Ling Shao

Abstract: Batch normalization (BN) is an important technique commonly incorporated into deep learning models to perform standardization within mini-batches. The merits of BN in improving a model's learning efficiency can be further amplified by applying whitening, while its drawbacks in estimating population statistics for inference can be avoided through group normalization (GN). This paper proposes group… ▽ More Batch normalization (BN) is an important technique commonly incorporated into deep learning models to perform standardization within mini-batches. The merits of BN in improving a model's learning efficiency can be further amplified by applying whitening, while its drawbacks in estimating population statistics for inference can be avoided through group normalization (GN). This paper proposes group whitening (GW), which exploits the advantages of the whitening operation and avoids the disadvantages of normalization within mini-batches. In addition, we analyze the constraints imposed on features by normalization, and show how the batch size (group number) affects the performance of batch (group) normalized networks, from the perspective of model's representational capacity. This analysis provides theoretical guidance for applying GW in practice. Finally, we apply the proposed GW to ResNet and ResNeXt architectures and conduct experiments on the ImageNet and COCO benchmarks. Results show that GW consistently improves the performance of different architectures, with absolute gains of $1.02\%$ $\sim$ $1.49\%$ in top-1 accuracy on ImageNet and $1.82\%$ $\sim$ $3.21\%$ in bounding box AP on COCO. △ Less

Submitted 6 April, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

Comments: V4: camera version of CVPR 2021. Code available at: https://github.com/huangleiBuaa/GroupWhitening

arXiv:2009.12836 [pdf, other]

Normalization Techniques in Training DNNs: Methodology, Analysis and Application

Authors: Lei Huang, Jie Qin, Yi Zhou, Fan Zhu, Li Liu, Ling Shao

Abstract: Normalization techniques are essential for accelerating the training and improving the generalization of deep neural networks (DNNs), and have successfully been used in various applications. This paper reviews and comments on the past, present and future of normalization methods in the context of DNN training. We provide a unified picture of the main motivation behind different approaches from the… ▽ More Normalization techniques are essential for accelerating the training and improving the generalization of deep neural networks (DNNs), and have successfully been used in various applications. This paper reviews and comments on the past, present and future of normalization methods in the context of DNN training. We provide a unified picture of the main motivation behind different approaches from the perspective of optimization, and present a taxonomy for understanding the similarities and differences between them. Specifically, we decompose the pipeline of the most representative normalizing activation methods into three components: the normalization area partitioning, normalization operation and normalization representation recovery. In doing so, we provide insight for designing new normalization technique. Finally, we discuss the current progress in understanding normalization methods, and provide a comprehensive review of the applications of normalization for particular tasks, in which it can effectively solve the key issues. △ Less

Submitted 27 September, 2020; originally announced September 2020.

Comments: 20 pages

arXiv:2009.09074 [pdf, other]

COVID-19 Literature Topic-Based Search via Hierarchical NMF

Authors: Rachel Grotheer, Yihuan Huang, Pengyu Li, Elizaveta Rebrova, Deanna Needell, Longxiu Huang, Alona Kryshchenko, Xia Li, Kyung Ha, Oleksandr Kryshchenko

Abstract: A dataset of COVID-19-related scientific literature is compiled, combining the articles from several online libraries and selecting those with open access and full text available. Then, hierarchical nonnegative matrix factorization is used to organize literature related to the novel coronavirus into a tree structure that allows researchers to search for relevant literature based on detected topics… ▽ More A dataset of COVID-19-related scientific literature is compiled, combining the articles from several online libraries and selecting those with open access and full text available. Then, hierarchical nonnegative matrix factorization is used to organize literature related to the novel coronavirus into a tree structure that allows researchers to search for relevant literature based on detected topics. We discover eight major latent topics and 52 granular subtopics in the body of literature, related to vaccines, genetic structure and modeling of the disease and patient studies, as well as related diseases and virology. In order that our tool may help current researchers, an interactive website is created that organizes available literature using this hierarchical structure. △ Less

Submitted 7 September, 2020; originally announced September 2020.

arXiv:2007.13040 [pdf, other]

Improving Generalization in Meta-learning via Task Augmentation

Authors: Huaxiu Yao, Longkai Huang, Linjun Zhang, Ying Wei, Li Tian, James Zou, Junzhou Huang, Zhenhui Li

Abstract: Meta-learning has proven to be a powerful paradigm for transferring the knowledge from previous tasks to facilitate the learning of a novel task. Current dominant algorithms train a well-generalized model initialization which is adapted to each task via the support set. The crux lies in optimizing the generalization capability of the initialization, which is measured by the performance of the adap… ▽ More Meta-learning has proven to be a powerful paradigm for transferring the knowledge from previous tasks to facilitate the learning of a novel task. Current dominant algorithms train a well-generalized model initialization which is adapted to each task via the support set. The crux lies in optimizing the generalization capability of the initialization, which is measured by the performance of the adapted model on the query set of each task. Unfortunately, this generalization measure, evidenced by empirical results, pushes the initialization to overfit the meta-training tasks, which significantly impairs the generalization and adaptation to novel tasks. To address this issue, we actively augment a meta-training task with "more data" when evaluating the generalization. Concretely, we propose two task augmentation methods, including MetaMix and Channel Shuffle. MetaMix linearly combines features and labels of samples from both the support and query sets. For each class of samples, Channel Shuffle randomly replaces a subset of their channels with the corresponding ones from a different class. Theoretical studies show how task augmentation improves the generalization of meta-learning. Moreover, both MetaMix and Channel Shuffle outperform state-of-the-art results by a large margin across many datasets and are compatible with existing meta-learning algorithms. △ Less

Submitted 9 June, 2021; v1 submitted 25 July, 2020; originally announced July 2020.

Comments: Accepted by ICML 2021

arXiv:2007.04873 [pdf, other]

Invertible Zero-Shot Recognition Flows

Authors: Yuming Shen, Jie Qin, Lei Huang

Abstract: Deep generative models have been successfully applied to Zero-Shot Learning (ZSL) recently. However, the underlying drawbacks of GANs and VAEs (e.g., the hardness of training with ZSL-oriented regularizers and the limited generation quality) hinder the existing generative ZSL models from fully bypassing the seen-unseen bias. To tackle the above limitations, for the first time, this work incorporat… ▽ More Deep generative models have been successfully applied to Zero-Shot Learning (ZSL) recently. However, the underlying drawbacks of GANs and VAEs (e.g., the hardness of training with ZSL-oriented regularizers and the limited generation quality) hinder the existing generative ZSL models from fully bypassing the seen-unseen bias. To tackle the above limitations, for the first time, this work incorporates a new family of generative models (i.e., flow-based models) into ZSL. The proposed Invertible Zero-shot Flow (IZF) learns factorized data embeddings (i.e., the semantic factors and the non-semantic ones) with the forward pass of an invertible flow network, while the reverse pass generates data samples. This procedure theoretically extends conventional generative flows to a factorized conditional scheme. To explicitly solve the bias problem, our model enlarges the seen-unseen distributional discrepancy based on negative sample-based distance measurement. Notably, IZF works flexibly with either a naive Bayesian classifier or a held-out trainable one for zero-shot recognition. Experiments on widely-adopted ZSL benchmarks demonstrate the significant performance gain of IZF over existing methods, in both classic and generalized settings. △ Less

Submitted 9 July, 2020; originally announced July 2020.

Comments: ECCV2020

arXiv:2007.00784 [pdf, other]

Convolutional Neural Network Training with Distributed K-FAC

Authors: J. Gregory Pauloski, Zhao Zhang, Lei Huang, Weijia Xu, Ian T. Foster

Abstract: Training neural networks with many processors can reduce time-to-solution; however, it is challenging to maintain convergence and efficiency at large scales. The Kronecker-factored Approximate Curvature (K-FAC) was recently proposed as an approximation of the Fisher Information Matrix that can be used in natural gradient optimizers. We investigate here a scalable K-FAC design and its applicability… ▽ More Training neural networks with many processors can reduce time-to-solution; however, it is challenging to maintain convergence and efficiency at large scales. The Kronecker-factored Approximate Curvature (K-FAC) was recently proposed as an approximation of the Fisher Information Matrix that can be used in natural gradient optimizers. We investigate here a scalable K-FAC design and its applicability in convolutional neural network (CNN) training at scale. We study optimization techniques such as layer-wise distribution strategies, inverse-free second-order gradient evaluation, and dynamic K-FAC update decoupling to reduce training time while preserving convergence. We use residual neural networks (ResNet) applied to the CIFAR-10 and ImageNet-1k datasets to evaluate the correctness and scalability of our K-FAC gradient preconditioner. With ResNet-50 on the ImageNet-1k dataset, our distributed K-FAC implementation converges to the 75.9% MLPerf baseline in 18-25% less time than does the classic stochastic gradient descent (SGD) optimizer across scales on a GPU cluster. △ Less

Submitted 1 July, 2020; originally announced July 2020.

Comments: To be published in the proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC20)

arXiv:2006.12772 [pdf, ps, other]

Combinatorial Pure Exploration of Dueling Bandit

Authors: Wei Chen, Yihan Du, Longbo Huang, Haoyu Zhao

Abstract: In this paper, we study combinatorial pure exploration for dueling bandits (CPE-DB): we have multiple candidates for multiple positions as modeled by a bipartite graph, and in each round we sample a duel of two candidates on one position and observe who wins in the duel, with the goal of finding the best candidate-position matching with high probability after multiple rounds of samples. CPE-DB is… ▽ More In this paper, we study combinatorial pure exploration for dueling bandits (CPE-DB): we have multiple candidates for multiple positions as modeled by a bipartite graph, and in each round we sample a duel of two candidates on one position and observe who wins in the duel, with the goal of finding the best candidate-position matching with high probability after multiple rounds of samples. CPE-DB is an adaptation of the original combinatorial pure exploration for multi-armed bandit (CPE-MAB) problem to the dueling bandit setting. We consider both the Borda winner and the Condorcet winner cases. For Borda winner, we establish a reduction of the problem to the original CPE-MAB setting and design PAC and exact algorithms that achieve both the sample complexity similar to that in the CPE-MAB setting (which is nearly optimal for a subclass of problems) and polynomial running time per round. For Condorcet winner, we first design a fully polynomial time approximation scheme (FPTAS) for the offline problem of finding the Condorcet winner with known winning probabilities, and then use the FPTAS as an oracle to design a novel pure exploration algorithm ${\sf CAR}$-${\sf Cond}$ with sample complexity analysis. ${\sf CAR}$-${\sf Cond}$ is the first algorithm with polynomial running time per round for identifying the Condorcet winner in CPE-DB. △ Less

Submitted 23 June, 2020; originally announced June 2020.

Comments: Accepted to ICML 2020

arXiv:2006.10254 [pdf, other]

Neural Manifold Ordinary Differential Equations

Authors: Aaron Lou, Derek Lim, Isay Katsman, Leo Huang, Qingxuan Jiang, Ser-Nam Lim, Christopher De Sa

Abstract: To better conform to data geometry, recent deep generative modelling techniques adapt Euclidean constructions to non-Euclidean spaces. In this paper, we study normalizing flows on manifolds. Previous work has developed flow models for specific cases; however, these advancements hand craft layers on a manifold-by-manifold basis, restricting generality and inducing cumbersome design constraints. We… ▽ More To better conform to data geometry, recent deep generative modelling techniques adapt Euclidean constructions to non-Euclidean spaces. In this paper, we study normalizing flows on manifolds. Previous work has developed flow models for specific cases; however, these advancements hand craft layers on a manifold-by-manifold basis, restricting generality and inducing cumbersome design constraints. We overcome these issues by introducing Neural Manifold Ordinary Differential Equations, a manifold generalization of Neural ODEs, which enables the construction of Manifold Continuous Normalizing Flows (MCNFs). MCNFs require only local geometry (therefore generalizing to arbitrary manifolds) and compute probabilities with continuous change of variables (allowing for a simple and expressive flow construction). We find that leveraging continuous manifold dynamics produces a marked improvement for both density estimation and downstream tasks. △ Less

Submitted 17 June, 2020; originally announced June 2020.

Comments: Submitted to NeurIPS 2020

arXiv:2006.06555 [pdf, ps, other]

Multi-Agent Reinforcement Learning in Stochastic Networked Systems

Authors: Yiheng Lin, Guannan Qu, Longbo Huang, Adam Wierman

Abstract: We study multi-agent reinforcement learning (MARL) in a stochastic network of agents. The objective is to find localized policies that maximize the (discounted) global reward. In general, scalability is a challenge in this setting because the size of the global state/action space can be exponential in the number of agents. Scalable algorithms are only known in cases where dependencies are static,… ▽ More We study multi-agent reinforcement learning (MARL) in a stochastic network of agents. The objective is to find localized policies that maximize the (discounted) global reward. In general, scalability is a challenge in this setting because the size of the global state/action space can be exponential in the number of agents. Scalable algorithms are only known in cases where dependencies are static, fixed and local, e.g., between neighbors in a fixed, time-invariant underlying graph. In this work, we propose a Scalable Actor Critic framework that applies in settings where the dependencies can be non-local and stochastic, and provide a finite-time error bound that shows how the convergence rate depends on the speed of information spread in the network. Additionally, as a byproduct of our analysis, we obtain novel finite-time convergence results for a general stochastic approximation scheme and for temporal difference learning with state aggregation, which apply beyond the setting of MARL in networked systems. △ Less

Submitted 1 November, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

arXiv:2006.06193 [pdf, other]

Exploration by Maximizing Rényi Entropy for Reward-Free RL Framework

Authors: Chuheng Zhang, Yuanying Cai, Longbo Huang, Jian Li

Abstract: Exploration is essential for reinforcement learning (RL). To face the challenges of exploration, we consider a reward-free RL framework that completely separates exploration from exploitation and brings new challenges for exploration algorithms. In the exploration phase, the agent learns an exploratory policy by interacting with a reward-free environment and collects a dataset of transitions by ex… ▽ More Exploration is essential for reinforcement learning (RL). To face the challenges of exploration, we consider a reward-free RL framework that completely separates exploration from exploitation and brings new challenges for exploration algorithms. In the exploration phase, the agent learns an exploratory policy by interacting with a reward-free environment and collects a dataset of transitions by executing the policy. In the planning phase, the agent computes a good policy for any reward function based on the dataset without further interacting with the environment. This framework is suitable for the meta RL setting where there are many reward functions of interest. In the exploration phase, we propose to maximize the Renyi entropy over the state-action space and justify this objective theoretically. The success of using Renyi entropy as the objective results from its encouragement to explore the hard-to-reach state-actions. We further deduce a policy gradient formulation for this objective and design a practical exploration algorithm that can deal with complex environments. In the planning phase, we solve for good policies given arbitrary reward functions using a batch RL algorithm. Empirically, we show that our exploration algorithm is effective and sample efficient, and results in superior policies for arbitrary reward functions in the planning phase. △ Less

Submitted 10 December, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: Accepted by AAAI-21

arXiv:2006.04778 [pdf, other]

Fair Classification with Noisy Protected Attributes: A Framework with Provable Guarantees

Authors: L. Elisa Celis, Lingxiao Huang, Vijay Keswani, Nisheeth K. Vishnoi

Abstract: We present an optimization framework for learning a fair classifier in the presence of noisy perturbations in the protected attributes. Compared to prior work, our framework can be employed with a very general class of linear and linear-fractional fairness constraints, can handle multiple, non-binary protected attributes, and outputs a classifier that comes with provable guarantees on both accurac… ▽ More We present an optimization framework for learning a fair classifier in the presence of noisy perturbations in the protected attributes. Compared to prior work, our framework can be employed with a very general class of linear and linear-fractional fairness constraints, can handle multiple, non-binary protected attributes, and outputs a classifier that comes with provable guarantees on both accuracy and fairness. Empirically, we show that our framework can be used to attain either statistical rate or false positive rate fairness guarantees with a minimal loss in accuracy, even when the noise is large, in two real-world datasets. △ Less

Submitted 16 February, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

arXiv:2006.01424 [pdf, other]

Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining

Authors: Yiqun Mei, Yuchen Fan, Yuqian Zhou, Lichao Huang, Thomas S. Huang, Humphrey Shi

Abstract: Deep convolution-based single image super-resolution (SISR) networks embrace the benefits of learning from large-scale external image resources for local recovery, yet most existing works have ignored the long-range feature-wise similarities in natural images. Some recent works have successfully leveraged this intrinsic feature correlation by exploring non-local attention modules. However, none of… ▽ More Deep convolution-based single image super-resolution (SISR) networks embrace the benefits of learning from large-scale external image resources for local recovery, yet most existing works have ignored the long-range feature-wise similarities in natural images. Some recent works have successfully leveraged this intrinsic feature correlation by exploring non-local attention modules. However, none of the current deep models have studied another inherent property of images: cross-scale feature correlation. In this paper, we propose the first Cross-Scale Non-Local (CS-NL) attention module with integration into a recurrent neural network. By combining the new CS-NL prior with local and in-scale non-local priors in a powerful recurrent fusion cell, we can find more cross-scale feature correlations within a single low-resolution (LR) image. The performance of SISR is significantly improved by exhaustively integrating all possible priors. Extensive experiments demonstrate the effectiveness of the proposed CS-NL module by setting new state-of-the-arts on multiple SISR benchmarks. △ Less

Submitted 2 June, 2020; originally announced June 2020.

Comments: CVPR2020

arXiv:2006.00978 [pdf, ps, other]

On the Number of Linear Regions of Convolutional Neural Networks

Authors: H. Xiong, L. Huang, M. Yu, L. Liu, F. Zhu, L. Shao

Abstract: One fundamental problem in deep learning is understanding the outstanding performance of deep Neural Networks (NNs) in practice. One explanation for the superiority of NNs is that they can realize a large class of complicated functions, i.e., they have powerful expressivity. The expressivity of a ReLU NN can be quantified by the maximal number of linear regions it can separate its input space into… ▽ More One fundamental problem in deep learning is understanding the outstanding performance of deep Neural Networks (NNs) in practice. One explanation for the superiority of NNs is that they can realize a large class of complicated functions, i.e., they have powerful expressivity. The expressivity of a ReLU NN can be quantified by the maximal number of linear regions it can separate its input space into. In this paper, we provide several mathematical results needed for studying the linear regions of CNNs, and use them to derive the maximal and average numbers of linear regions for one-layer ReLU CNNs. Furthermore, we obtain upper and lower bounds for the number of linear regions of multi-layer ReLU CNNs. Our results suggest that deeper CNNs have more powerful expressivity than their shallow counterparts, while CNNs have more expressivity than fully-connected NNs per parameter. △ Less

Submitted 27 June, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: International Conference on Machine Learning (ICML) 2020

arXiv:2002.10319 [pdf, other]

Self-Adaptive Training: beyond Empirical Risk Minimization

Authors: Lang Huang, Chao Zhang, Hongyang Zhang

Abstract: We propose self-adaptive training---a new training algorithm that dynamically corrects problematic training labels by model predictions without incurring extra computational cost---to improve generalization of deep learning for potentially corrupted training data. This problem is crucial towards robustly learning from data that are corrupted by, e.g., label noises and out-of-distribution samples.… ▽ More We propose self-adaptive training---a new training algorithm that dynamically corrects problematic training labels by model predictions without incurring extra computational cost---to improve generalization of deep learning for potentially corrupted training data. This problem is crucial towards robustly learning from data that are corrupted by, e.g., label noises and out-of-distribution samples. The standard empirical risk minimization (ERM) for such data, however, may easily overfit noises and thus suffers from sub-optimal performance. In this paper, we observe that model predictions can substantially benefit the training process: self-adaptive training significantly improves generalization over ERM under various levels of noises, and mitigates the overfitting issue in both natural and adversarial training. We evaluate the error-capacity curve of self-adaptive training: the test error is monotonously decreasing w.r.t. model capacity. This is in sharp contrast to the recently-discovered double-descent phenomenon in ERM which might be a result of overfitting of noises. Experiments on CIFAR and ImageNet datasets verify the effectiveness of our approach in two applications: classification with label noise and selective classification. We release our code at https://github.com/LayneH/self-adaptive-training. △ Less

Submitted 30 September, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

Comments: To appear in NeurIPS 2020

arXiv:2002.02090 [pdf, other]

Faster On-Device Training Using New Federated Momentum Algorithm

Authors: Zhouyuan Huo, Qian Yang, Bin Gu, Lawrence Carin. Heng Huang

Abstract: Mobile crowdsensing has gained significant attention in recent years and has become a critical paradigm for emerging Internet of Things applications. The sensing devices continuously generate a significant quantity of data, which provide tremendous opportunities to develop innovative intelligent applications. To utilize these data to train machine learning models while not compromising user privac… ▽ More Mobile crowdsensing has gained significant attention in recent years and has become a critical paradigm for emerging Internet of Things applications. The sensing devices continuously generate a significant quantity of data, which provide tremendous opportunities to develop innovative intelligent applications. To utilize these data to train machine learning models while not compromising user privacy, federated learning has become a promising solution. However, there is little understanding of whether federated learning algorithms are guaranteed to converge. We reconsider model averaging in federated learning and formulate it as a gradient-based method with biased gradients. This novel perspective assists analysis of its convergence rate and provides a new direction for more acceleration. We prove for the first time that the federated averaging algorithm is guaranteed to converge for non-convex problems, without imposing additional assumptions. We further propose a novel accelerated federated learning algorithm and provide a convergence guarantee. Simulated federated learning experiments are conducted to train deep neural networks on benchmark datasets, and experimental results show that our proposed method converges faster than previous approaches. △ Less

Submitted 5 February, 2020; originally announced February 2020.

arXiv:2002.00401 [pdf]

Provable Noisy Sparse Subspace Clustering using Greedy Neighbor Selection: A Coherence-Based Perspective

Authors: Jwo-Yuh Wu, Wen-Hsuan Li, Liang-Chi Huang, Yen-** Lin, Chun-Hung Liu, Rung-Hung Gau

Abstract: Sparse subspace clustering (SSC) using greedy-based neighbor selection, such as matching pursuit (MP) and orthogonal matching pursuit (OMP), has been known as a popular computationally-efficient alternative to the conventional L1-minimization based methods. Under deterministic bounded noise corruption, in this paper we derive coherence-based sufficient conditions guaranteeing correct neighbor iden… ▽ More Sparse subspace clustering (SSC) using greedy-based neighbor selection, such as matching pursuit (MP) and orthogonal matching pursuit (OMP), has been known as a popular computationally-efficient alternative to the conventional L1-minimization based methods. Under deterministic bounded noise corruption, in this paper we derive coherence-based sufficient conditions guaranteeing correct neighbor identification using MP/OMP. Our analyses exploit the maximum/minimum inner product between two noisy data points subject to a known upper bound on the noise level. The obtained sufficient condition clearly reveals the impact of noise on greedy-based neighbor recovery. Specifically, it asserts that, as long as noise is sufficiently small so that the resultant perturbed residual vectors stay close to the desired subspace, both MP and OMP succeed in returning a correct neighbor subset. A striking finding is that, when the ground truth subspaces are well-separated from each other and noise is not large, MP-based iterations, while enjoying lower algorithmic complexity, yield smaller perturbation of residuals, thereby better able to identify correct neighbors and, in turn, achieving higher global data clustering accuracy. Extensive numerical experiments are used to corroborate our theoretical study. △ Less

Submitted 2 February, 2020; originally announced February 2020.

arXiv:1911.04207 [pdf, other]

Multi-Path Policy Optimization

Authors: Ling Pan, Qingpeng Cai, Longbo Huang

Abstract: Recent years have witnessed a tremendous improvement of deep reinforcement learning. However, a challenging problem is that an agent may suffer from inefficient exploration, particularly for on-policy methods. Previous exploration methods either rely on complex structure to estimate the novelty of states, or incur sensitive hyper-parameters causing instability. We propose an efficient exploration… ▽ More Recent years have witnessed a tremendous improvement of deep reinforcement learning. However, a challenging problem is that an agent may suffer from inefficient exploration, particularly for on-policy methods. Previous exploration methods either rely on complex structure to estimate the novelty of states, or incur sensitive hyper-parameters causing instability. We propose an efficient exploration method, Multi-Path Policy Optimization (MPPO), which does not incur high computation cost and ensures stability. MPPO maintains an efficient mechanism that effectively utilizes a population of diverse policies to enable better exploration, especially in sparse environments. We also give a theoretical guarantee of the stable performance. We build our scheme upon two widely-adopted on-policy methods, the Trust-Region Policy Optimization algorithm and Proximal Policy Optimization algorithm. We conduct extensive experiments on several MuJoCo tasks and their sparsified variants to fairly evaluate the proposed method. Results show that MPPO significantly outperforms state-of-the-art exploration methods in terms of both sample efficiency and final performance. △ Less

Submitted 14 February, 2020; v1 submitted 11 November, 2019; originally announced November 2019.

Comments: AAMAS-2020

arXiv:1911.00741 [pdf, other]

Yakovlev Promotion Time Cure Model with Local Polynomial Estimation

Authors: Li-Hsiang Lin, Li-Shan Huang

Abstract: In modeling survival data with a cure fraction, flexible modeling of covariate effects on the probability of cure has important medical implications, which aids investigators in identifying better treatments to cure. This paper studies a semiparametric form of the Yakovlev promotion time cure model that allows for nonlinear effects of a continuous covariate. We adopt the local polynomial approach… ▽ More In modeling survival data with a cure fraction, flexible modeling of covariate effects on the probability of cure has important medical implications, which aids investigators in identifying better treatments to cure. This paper studies a semiparametric form of the Yakovlev promotion time cure model that allows for nonlinear effects of a continuous covariate. We adopt the local polynomial approach and use the local likelihood criterion to derive nonlinear estimates of covariate effects on cure rates, assuming that the baseline distribution function follows a parametric form. This way we adopt a flexible method to estimate the cure rate locally, the important part in cure models, and a convenient way to estimate the baseline function globally. An algorithm is proposed to implement estimation at both the local and global scales. Asymptotic properties of local polynomial estimates, the nonparametric part, are investigated in the presence of both censored and cured data, and the parametric part is shown to be root-n consistent. The proposed methods are illustrated by simulated and real data. △ Less

Submitted 4 November, 2019; v1 submitted 2 November, 2019; originally announced November 2019.

Comments: 26 pages, 4 figures

MSC Class: 62N99; 62G08

arXiv:1910.09734 [pdf, other]

Single and Union Non-parallel Support Vector Machine Frameworks

Authors: Chun-Na Li, Yuan-Hai Shao, Huajun Wang, Yu-Ting Zhao, Ling-Wei Huang, Naihua Xiu, Nai-Yang Deng

Abstract: Considering the classification problem, we summarize the nonparallel support vector machines with the nonparallel hyperplanes to two types of frameworks. The first type constructs the hyperplanes separately. It solves a series of small optimization problems to obtain a series of hyperplanes, but is hard to measure the loss of each sample. The other type constructs all the hyperplanes simultaneousl… ▽ More Considering the classification problem, we summarize the nonparallel support vector machines with the nonparallel hyperplanes to two types of frameworks. The first type constructs the hyperplanes separately. It solves a series of small optimization problems to obtain a series of hyperplanes, but is hard to measure the loss of each sample. The other type constructs all the hyperplanes simultaneously, and it solves one big optimization problem with the ascertained loss of each sample. We give the characteristics of each framework and compare them carefully. In addition, based on the second framework, we construct a max-min distance-based nonparallel support vector machine for multiclass classification problem, called NSVM. It constructs hyperplanes with large distance margin by solving an optimization problem. Experimental results on benchmark data sets show the advantages of our NSVM. △ Less

Submitted 25 June, 2021; v1 submitted 21 October, 2019; originally announced October 2019.

arXiv:1909.03276 [pdf, other]

Adaptive Factorization Network: Learning Adaptive-Order Feature Interactions

Authors: Weiyu Cheng, Yanyan Shen, Linpeng Huang

Abstract: Various factorization-based methods have been proposed to leverage second-order, or higher-order cross features for boosting the performance of predictive models. They generally enumerate all the cross features under a predefined maximum order, and then identify useful feature interactions through model training, which suffer from two drawbacks. First, they have to make a trade-off between the exp… ▽ More Various factorization-based methods have been proposed to leverage second-order, or higher-order cross features for boosting the performance of predictive models. They generally enumerate all the cross features under a predefined maximum order, and then identify useful feature interactions through model training, which suffer from two drawbacks. First, they have to make a trade-off between the expressiveness of higher-order cross features and the computational cost, resulting in suboptimal predictions. Second, enumerating all the cross features, including irrelevant ones, may introduce noisy feature combinations that degrade model performance. In this work, we propose the Adaptive Factorization Network (AFN), a new model that learns arbitrary-order cross features adaptively from data. The core of AFN is a logarithmic transformation layer to convert the power of each feature in a feature combination into the coefficient to be learned. The experimental results on four real datasets demonstrate the superior predictive performance of AFN against the start-of-the-arts. △ Less

Submitted 23 June, 2020; v1 submitted 7 September, 2019; originally announced September 2019.

Comments: Accepted by AAAI'20

arXiv:1906.08484 [pdf, other]

Coresets for Clustering with Fairness Constraints

Authors: Lingxiao Huang, Shaofeng H. -C. Jiang, Nisheeth K. Vishnoi

Abstract: In a recent work, [19] studied the following "fair" variants of classical clustering problems such as $k$-means and $k$-median: given a set of $n$ data points in $\mathbb{R}^d$ and a binary type associated to each data point, the goal is to cluster the points while ensuring that the proportion of each type in each cluster is roughly the same as its underlying proportion. Subsequent work has focuse… ▽ More In a recent work, [19] studied the following "fair" variants of classical clustering problems such as $k$-means and $k$-median: given a set of $n$ data points in $\mathbb{R}^d$ and a binary type associated to each data point, the goal is to cluster the points while ensuring that the proportion of each type in each cluster is roughly the same as its underlying proportion. Subsequent work has focused on either extending this setting to when each data point has multiple, non-disjoint sensitive types such as race and gender [6], or to address the problem that the clustering algorithms in the above work do not scale well. The main contribution of this paper is an approach to clustering with fairness constraints that involve multiple, non-disjoint types, that is also scalable. Our approach is based on novel constructions of coresets: for the $k$-median objective, we construct an $\varepsilon$-coreset of size $O(Γk^2 \varepsilon^{-d})$ where $Γ$ is the number of distinct collections of groups that a point may belong to, and for the $k$-means objective, we show how to construct an $\varepsilon$-coreset of size $O(Γk^3\varepsilon^{-d-1})$. The former result is the first known coreset construction for the fair clustering problem with the $k$-median objective, and the latter result removes the dependence on the size of the full dataset as in [39] and generalizes it to multiple, non-disjoint types. Plugging our coresets into existing algorithms for fair clustering such as [5] results in the fastest algorithms for several cases. Empirically, we assess our approach over the \textbf{Adult}, \textbf{Bank}, \textbf{Diabetes} and \textbf{Athlete} dataset, and show that the coreset sizes are much smaller than the full dataset. We also achieve a speed-up to recent fair clustering algorithms [5,6] by incorporating our coreset construction. △ Less

Submitted 17 December, 2019; v1 submitted 20 June, 2019; originally announced June 2019.

arXiv:1903.09296 [pdf]

Patient Clustering Improves Efficiency of Federated Machine Learning to predict mortality and hospital stay time using distributed Electronic Medical Records

Authors: Li Huang, Dianbo Liu

Abstract: Electronic medical records (EMRs) supports the development of machine learning algorithms for predicting disease incidence, patient response to treatment, and other healthcare events. But insofar most algorithms have been centralized, taking little account of the decentralized, non-identically independently distributed (non-IID), and privacy-sensitive characteristics of EMRs that can complicate da… ▽ More Electronic medical records (EMRs) supports the development of machine learning algorithms for predicting disease incidence, patient response to treatment, and other healthcare events. But insofar most algorithms have been centralized, taking little account of the decentralized, non-identically independently distributed (non-IID), and privacy-sensitive characteristics of EMRs that can complicate data collection, sharing and learning. To address this challenge, we introduced a community-based federated machine learning (CBFL) algorithm and evaluated it on non-IID ICU EMRs. Our algorithm clustered the distributed data into clinically meaningful communities that captured similar diagnoses and geological locations, and learnt one model for each community. Throughout the learning process, the data was kept local on hospitals, while locally-computed results were aggregated on a server. Evaluation results show that CBFL outperformed the baseline FL algorithm in terms of Area Under the Receiver Operating Characteristic Curve (ROC AUC), Area Under the Precision-Recall Curve (PR AUC), and communication cost between hospitals and the server. Furthermore, communities' performance difference could be explained by how dissimilar one community was to others. △ Less

Submitted 21 March, 2019; originally announced March 2019.

arXiv:1903.05926 [pdf, other]

Reinforcement Learning with Dynamic Boltzmann Softmax Updates

Authors: Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Longbo Huang, Tie-Yan Liu

Abstract: Value function estimation is an important task in reinforcement learning, i.e., prediction. The Boltzmann softmax operator is a natural value estimator and can provide several benefits. However, it does not satisfy the non-expansion property, and its direct use may fail to converge even in value iteration. In this paper, we propose to update the value function with dynamic Boltzmann softmax (DBS)… ▽ More Value function estimation is an important task in reinforcement learning, i.e., prediction. The Boltzmann softmax operator is a natural value estimator and can provide several benefits. However, it does not satisfy the non-expansion property, and its direct use may fail to converge even in value iteration. In this paper, we propose to update the value function with dynamic Boltzmann softmax (DBS) operator, which has good convergence property in the setting of planning and learning. Experimental results on GridWorld show that the DBS operator enables better estimation of the value function, which rectifies the convergence issue of the softmax operator. Finally, we propose the DBS-DQN algorithm by applying dynamic Boltzmann softmax updates in deep Q-network, which outperforms DQN substantially in 40 out of 49 Atari games. △ Less

Submitted 8 September, 2019; v1 submitted 14 March, 2019; originally announced March 2019.

arXiv:1902.07823 [pdf, other]

Stable and Fair Classification

Authors: Lingxiao Huang, Nisheeth K. Vishnoi

Abstract: Fair classification has been a topic of intense study in machine learning, and several algorithms have been proposed towards this important task. However, in a recent study, Friedler et al. observed that fair classification algorithms may not be stable with respect to variations in the training dataset -- a crucial consideration in several real-world applications. Motivated by their work, we study… ▽ More Fair classification has been a topic of intense study in machine learning, and several algorithms have been proposed towards this important task. However, in a recent study, Friedler et al. observed that fair classification algorithms may not be stable with respect to variations in the training dataset -- a crucial consideration in several real-world applications. Motivated by their work, we study the problem of designing classification algorithms that are both fair and stable. We propose an extended framework based on fair classification algorithms that are formulated as optimization problems, by introducing a stability-focused regularization term. Theoretically, we prove a stability guarantee, that was lacking in fair classification algorithms, and also provide an accuracy guarantee for our extended framework. Our accuracy guarantee can be used to inform the selection of the regularization parameter in our framework. To the best of our knowledge, this is the first work that combines stability and fairness in automated decision-making tasks. We assess the benefits of our approach empirically by extending several fair classification algorithms that are shown to achieve the best balance between fairness and accuracy over the Adult dataset. Our empirical results show that our framework indeed improves the stability at only a slight sacrifice in accuracy. △ Less

Submitted 9 September, 2020; v1 submitted 20 February, 2019; originally announced February 2019.

Showing 1–50 of 66 results for author: Huang, L