Search | arXiv e-print repository

Single-loop Stochastic Algorithms for Difference of Max-Structured Weakly Convex Functions

Authors: Quanqi Hu, Qi Qi, Zhaosong Lu, Tianbao Yang

Abstract: In this paper, we study a class of non-smooth non-convex problems in the form of $\min_{x}[\max_{y\in Y}φ(x, y) - \max_{z\in Z}ψ(x, z)]$, where both $Φ(x) = \max_{y\in Y}φ(x, y)$ and $Ψ(x)=\max_{z\in Z}ψ(x, z)$ are weakly convex functions, and $φ(x, y), ψ(x, z)$ are strongly concave functions in terms of $y$ and $z$, respectively. It covers two families of problems that have been studied but are m… ▽ More In this paper, we study a class of non-smooth non-convex problems in the form of $\min_{x}[\max_{y\in Y}φ(x, y) - \max_{z\in Z}ψ(x, z)]$, where both $Φ(x) = \max_{y\in Y}φ(x, y)$ and $Ψ(x)=\max_{z\in Z}ψ(x, z)$ are weakly convex functions, and $φ(x, y), ψ(x, z)$ are strongly concave functions in terms of $y$ and $z$, respectively. It covers two families of problems that have been studied but are missing single-loop stochastic algorithms, i.e., difference of weakly convex functions and weakly convex strongly-concave min-max problems. We propose a stochastic Moreau envelope approximate gradient method dubbed SMAG, the first single-loop algorithm for solving these problems, and provide a state-of-the-art non-asymptotic convergence rate. The key idea of the design is to compute an approximate gradient of the Moreau envelopes of $Φ, Ψ$ using only one step of stochastic gradient update of the primal and dual variables. Empirically, we conduct experiments on positive-unlabeled (PU) learning and partial area under ROC curve (pAUC) optimization with an adversarial fairness regularizer to validate the effectiveness of our proposed algorithms. △ Less

Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

arXiv:2401.17646 [pdf, other]

From Sparse to Dense Functional Data: Phase Transitions from a Simultaneous Inference Perspective

Authors: Leheng Cai, Qirui Hu

Abstract: We aim to develop simultaneous inference tools for the mean function of functional data from sparse to dense. First, we derive a unified Gaussian approximation to construct simultaneous confidence bands of mean functions based on the B-spline estimator. Then, we investigate the conditions of phase transitions by decomposing the asymptotic variance of the approximated Gaussian process. As an extens… ▽ More We aim to develop simultaneous inference tools for the mean function of functional data from sparse to dense. First, we derive a unified Gaussian approximation to construct simultaneous confidence bands of mean functions based on the B-spline estimator. Then, we investigate the conditions of phase transitions by decomposing the asymptotic variance of the approximated Gaussian process. As an extension, we also consider the orthogonal series estimator and show the corresponding conditions of phase transitions. Extensive simulation results strongly corroborate the theoretical results, and also illustrate the variation of the asymptotic distribution via the asymptotic variance decomposition we obtain. The developed method is further applied to body fat data and traffic data. △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2312.07032 [pdf, ps, other]

Ahpatron: A New Budgeted Online Kernel Learning Machine with Tighter Mistake Bound

Authors: Yun Liao, Junfan Li, Shizhong Liao, Qinghua Hu, Jianwu Dang

Abstract: In this paper, we study the mistake bound of online kernel learning on a budget. We propose a new budgeted online kernel learning model, called Ahpatron, which significantly improves the mistake bound of previous work and resolves the open problem posed by Dekel, Shalev-Shwartz, and Singer (2005). We first present an aggressive variant of Perceptron, named AVP, a model without budget, which uses a… ▽ More In this paper, we study the mistake bound of online kernel learning on a budget. We propose a new budgeted online kernel learning model, called Ahpatron, which significantly improves the mistake bound of previous work and resolves the open problem posed by Dekel, Shalev-Shwartz, and Singer (2005). We first present an aggressive variant of Perceptron, named AVP, a model without budget, which uses an active updating rule. Then we design a new budget maintenance mechanism, which removes a half of examples,and projects the removed examples onto a hypothesis space spanned by the remaining examples. Ahpatron adopts the above mechanism to approximate AVP. Theoretical analyses prove that Ahpatron has tighter mistake bounds, and experimental results show that Ahpatron outperforms the state-of-the-art algorithms on the same or a smaller budget. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2310.08843 [pdf]

A Longitudinal Analysis about the Effect of Air Pollution on Astigmatism for Children and Young Adults

Authors: Lin An, Qiuyue Hu, Jieying Guan, Yingting Zhu, Chenyao Jiang, Xiaoyun Zhong, Shuyue Ma, Dongmei Yu, Canyang Zhang, Yehong Zhuo, Peiwu Qin

Abstract: Purpose: This study aimed to investigate the correlation between air pollution and astigmatism, considering the detrimental effects of air pollution on respiratory, cardiovascular, and eye health. Methods: A longitudinal study was conducted with 127,709 individuals aged 4-27 years from 9 cities in Guangdong Province, China, spanning from 2019 to 2021. Astigmatism was measured using cylinder values… ▽ More Purpose: This study aimed to investigate the correlation between air pollution and astigmatism, considering the detrimental effects of air pollution on respiratory, cardiovascular, and eye health. Methods: A longitudinal study was conducted with 127,709 individuals aged 4-27 years from 9 cities in Guangdong Province, China, spanning from 2019 to 2021. Astigmatism was measured using cylinder values. Multiple measurements were taken at intervals of at least 1 year. Various exposure windows were used to assess the lagged impacts of air pollution on astigmatism. A panel data model with random effects was constructed to analyze the relationship between pollutant exposure and astigmatism. Results: The study revealed significant associations between astigmatism and exposure to carbon monoxide (CO), nitrogen dioxide (NO2), and particulate matter (PM2.5) over time. A 10 μg/m3 increase in a 3-year exposure window of NO2 and PM2.5 was associated with a decrease in cylinder value of -0.045 diopters and -0.017 diopters, respectively. A 0.1 mg/m3 increase in CO concentration within a 2-year exposure window correlated with a decrease in cylinder value of -0.009 diopters. No significant relationships were found between PM10 exposure and astigmatism. Conclusion: This study concluded that greater exposure to NO2 and PM2.5 over longer periods aggravates astigmatism. The negative effect of CO on astigmatism peaks in the exposure window of 2 years prior to examination and diminishes afterward. No significant association was found between PM10 exposure and astigmatism, suggesting that gaseous and smaller particulate pollutants have easier access to human eyes, causing heterogeneous morphological changes to the eyeball. △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2310.03234 [pdf, other]

Non-Smooth Weakly-Convex Finite-sum Coupled Compositional Optimization

Authors: Quanqi Hu, Dixian Zhu, Tianbao Yang

Abstract: This paper investigates new families of compositional optimization problems, called $\underline{\bf n}$on-$\underline{\bf s}$mooth $\underline{\bf w}$eakly-$\underline{\bf c}$onvex $\underline{\bf f}$inite-sum $\underline{\bf c}$oupled $\underline{\bf c}$ompositional $\underline{\bf o}$ptimization (NSWC FCCO). There has been a growing interest in FCCO due to its wide-ranging applications in machin… ▽ More This paper investigates new families of compositional optimization problems, called $\underline{\bf n}$on-$\underline{\bf s}$mooth $\underline{\bf w}$eakly-$\underline{\bf c}$onvex $\underline{\bf f}$inite-sum $\underline{\bf c}$oupled $\underline{\bf c}$ompositional $\underline{\bf o}$ptimization (NSWC FCCO). There has been a growing interest in FCCO due to its wide-ranging applications in machine learning and AI, as well as its ability to address the shortcomings of stochastic algorithms based on empirical risk minimization. However, current research on FCCO presumes that both the inner and outer functions are smooth, limiting their potential to tackle a more diverse set of problems. Our research expands on this area by examining non-smooth weakly-convex FCCO, where the outer function is weakly convex and non-decreasing, and the inner function is weakly-convex. We analyze a single-loop algorithm and establish its complexity for finding an $ε$-stationary point of the Moreau envelop of the objective function. Additionally, we also extend the algorithm to solving novel non-smooth weakly-convex tri-level finite-sum coupled compositional optimization problems, which feature a nested arrangement of three functions. Lastly, we explore the applications of our algorithms in deep learning for two-way partial AUC maximization and multi-instance two-way partial AUC maximization, using empirical studies to showcase the effectiveness of the proposed algorithms. △ Less

Submitted 3 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

arXiv:2308.01314 [pdf, other]

Evaluating the Robustness of Test Selection Methods for Deep Neural Networks

Authors: Qiang Hu, Yuejun Guo, Xiaofei Xie, Maxime Cordy, Wei Ma, Mike Papadakis, Yves Le Traon

Abstract: Testing deep learning-based systems is crucial but challenging due to the required time and labor for labeling collected raw data. To alleviate the labeling effort, multiple test selection methods have been proposed where only a subset of test data needs to be labeled while satisfying testing requirements. However, we observe that such methods with reported promising results are only evaluated und… ▽ More Testing deep learning-based systems is crucial but challenging due to the required time and labor for labeling collected raw data. To alleviate the labeling effort, multiple test selection methods have been proposed where only a subset of test data needs to be labeled while satisfying testing requirements. However, we observe that such methods with reported promising results are only evaluated under simple scenarios, e.g., testing on original test data. This brings a question to us: are they always reliable? In this paper, we explore when and to what extent test selection methods fail for testing. Specifically, first, we identify potential pitfalls of 11 selection methods from top-tier venues based on their construction. Second, we conduct a study on five datasets with two model architectures per dataset to empirically confirm the existence of these pitfalls. Furthermore, we demonstrate how pitfalls can break the reliability of these methods. Concretely, methods for fault detection suffer from test data that are: 1) correctly classified but uncertain, or 2) misclassified but confident. Remarkably, the test relative coverage achieved by such methods drops by up to 86.85%. On the other hand, methods for performance estimation are sensitive to the choice of intermediate-layer output. The effectiveness of such methods can be even worse than random selection when using an inappropriate layer. △ Less

Submitted 29 July, 2023; originally announced August 2023.

Comments: 12 pages

arXiv:2306.10260 [pdf, other]

Online Local Differential Private Quantile Inference via Self-normalization

Authors: Yi Liu, Qirui Hu, Lei Ding, Bei Jiang, Linglong Kong

Abstract: Based on binary inquiries, we developed an algorithm to estimate population quantiles under Local Differential Privacy (LDP). By self-normalizing, our algorithm provides asymptotically normal estimation with valid inference, resulting in tight confidence intervals without the need for nuisance parameters to be estimated. Our proposed method can be conducted fully online, leading to high computatio… ▽ More Based on binary inquiries, we developed an algorithm to estimate population quantiles under Local Differential Privacy (LDP). By self-normalizing, our algorithm provides asymptotically normal estimation with valid inference, resulting in tight confidence intervals without the need for nuisance parameters to be estimated. Our proposed method can be conducted fully online, leading to high computational efficiency and minimal storage requirements with $\mathcal{O}(1)$ space. We also proved an optimality result by an elegant application of one central limit theorem of Gaussian Differential Privacy (GDP) when targeting the frequently encountered median estimation problem. With mathematical proof and extensive numerical testing, we demonstrate the validity of our algorithm both theoretically and experimentally. △ Less

Submitted 7 August, 2023; v1 submitted 17 June, 2023; originally announced June 2023.

arXiv:2305.18730 [pdf, other]

Blockwise Stochastic Variance-Reduced Methods with Parallel Speedup for Multi-Block Bilevel Optimization

Authors: Quanqi Hu, Zi-Hao Qiu, Zhishuai Guo, Lijun Zhang, Tianbao Yang

Abstract: In this paper, we consider non-convex multi-block bilevel optimization (MBBO) problems, which involve $m\gg 1$ lower level problems and have important applications in machine learning. Designing a stochastic gradient and controlling its variance is more intricate due to the hierarchical sampling of blocks and data and the unique challenge of estimating hyper-gradient. We aim to achieve three nice… ▽ More In this paper, we consider non-convex multi-block bilevel optimization (MBBO) problems, which involve $m\gg 1$ lower level problems and have important applications in machine learning. Designing a stochastic gradient and controlling its variance is more intricate due to the hierarchical sampling of blocks and data and the unique challenge of estimating hyper-gradient. We aim to achieve three nice properties for our algorithm: (a) matching the state-of-the-art complexity of standard BO problems with a single block; (b) achieving parallel speedup by sampling $I$ blocks and sampling $B$ samples for each sampled block per-iteration; (c) avoiding the computation of the inverse of a high-dimensional Hessian matrix estimator. However, it is non-trivial to achieve all of these by observing that existing works only achieve one or two of these properties. To address the involved challenges for achieving (a, b, c), we propose two stochastic algorithms by using advanced blockwise variance-reduction techniques for tracking the Hessian matrices (for low-dimensional problems) or the Hessian-vector products (for high-dimensional problems), and prove an iteration complexity of $O(\frac{mε^{-3}\mathbb{I}(I<m)}{I\sqrt{I}} + \frac{mε^{-3}}{I\sqrt{B}})$ for finding an $ε$-stationary point under appropriate conditions. We also conduct experiments to verify the effectiveness of the proposed algorithms comparing with existing MBBO algorithms. △ Less

Submitted 2 June, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.11965 [pdf, other]

Not All Semantics are Created Equal: Contrastive Self-supervised Learning with Automatic Temperature Individualization

Authors: Zi-Hao Qiu, Quanqi Hu, Zhuoning Yuan, Denny Zhou, Lijun Zhang, Tianbao Yang

Abstract: In this paper, we aim to optimize a contrastive loss with individualized temperatures in a principled and systematic manner for self-supervised learning. The common practice of using a global temperature parameter $τ$ ignores the fact that ``not all semantics are created equal", meaning that different anchor data may have different numbers of samples with similar semantics, especially when data ex… ▽ More In this paper, we aim to optimize a contrastive loss with individualized temperatures in a principled and systematic manner for self-supervised learning. The common practice of using a global temperature parameter $τ$ ignores the fact that ``not all semantics are created equal", meaning that different anchor data may have different numbers of samples with similar semantics, especially when data exhibits long-tails. First, we propose a new robust contrastive loss inspired by distributionally robust optimization (DRO), providing us an intuition about the effect of $τ$ and a mechanism for automatic temperature individualization. Then, we propose an efficient stochastic algorithm for optimizing the robust contrastive loss with a provable convergence guarantee without using large mini-batch sizes. Theoretical and experimental results show that our algorithm automatically learns a suitable $τ$ for each sample. Specifically, samples with frequent semantics use large temperatures to keep local semantic structures, while samples with rare semantics use small temperatures to induce more separable features. Our method not only outperforms prior strong baselines (e.g., SimCLR, CLIP) on unimodal and bimodal datasets with larger improvements on imbalanced data but also is less sensitive to hyper-parameters. To our best knowledge, this is the first methodical approach to optimizing a contrastive loss with individualized temperatures. △ Less

Submitted 19 May, 2023; originally announced May 2023.

Comments: 33 pages, 11 figures, accepted by ICML2023

arXiv:2206.02017 [pdf, ps, other]

Feature screening for multi-response linear models by empirical likelihood

Authors: Jun Lu, Qinqin Hu, Lu Lin

Abstract: This paper proposes a new feature screening method for the multi-response ultrahigh dimensional linear model by empirical likelihood. Through a multivariate moment condition, the empirical likelihood induced ranking statistics can exploit the joint effect among responses, and thus result in a much better performance than the methods considering responses individually. More importantly, by the use… ▽ More This paper proposes a new feature screening method for the multi-response ultrahigh dimensional linear model by empirical likelihood. Through a multivariate moment condition, the empirical likelihood induced ranking statistics can exploit the joint effect among responses, and thus result in a much better performance than the methods considering responses individually. More importantly, by the use of empirical likelihood, the new method adapts to the heterogeneity in the conditional variance of random error. The sure screening property of the newly proposed method is proved with the model size controlled within a reasonable scale. Additionally, the new screening method is also extended to a conditional version so that it can recover the hidden predictors which are easily missed by the unconditional method. The corresponding theoretical properties are also provided. Finally, both numerical studies and real data analysis are provided to illustrate the effectiveness of the proposed methods. △ Less

Submitted 4 June, 2022; originally announced June 2022.

arXiv:2206.00260 [pdf, other]

Multi-block Min-max Bilevel Optimization with Applications in Multi-task Deep AUC Maximization

Authors: Quanqi Hu, Yongjian Zhong, Tianbao Yang

Abstract: In this paper, we study multi-block min-max bilevel optimization problems, where the upper level is non-convex strongly-concave minimax objective and the lower level is a strongly convex objective, and there are multiple blocks of dual variables and lower level problems. Due to the intertwined multi-block min-max bilevel structure, the computational cost at each iteration could be prohibitively hi… ▽ More In this paper, we study multi-block min-max bilevel optimization problems, where the upper level is non-convex strongly-concave minimax objective and the lower level is a strongly convex objective, and there are multiple blocks of dual variables and lower level problems. Due to the intertwined multi-block min-max bilevel structure, the computational cost at each iteration could be prohibitively high, especially with a large number of blocks. To tackle this challenge, we present a single-loop randomized stochastic algorithm, which requires updates for only a constant number of blocks at each iteration. Under some mild assumptions on the problem, we establish its sample complexity of $O(1/ε^4)$ for finding an $ε$-stationary point. This matches the optimal complexity for solving stochastic nonconvex optimization under a general unbiased stochastic oracle model. Moreover, we provide two applications of the proposed method in multi-task deep AUC (area under ROC curve) maximization and multi-task deep partial AUC maximization. Experimental results validate our theory and demonstrate the effectiveness of our method on problems with hundreds of tasks. △ Less

Submitted 17 November, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

arXiv:2203.16261 [pdf]

Packaging, containerization, and virtualization of computational omics methods: Advances, challenges, and opportunities

Authors: Mohammed Alser, Sharon Waymost, Ram Ayyala, Brendan Lawlor, Richard J. Abdill, Neha Rajkumar, Nathan LaPierre, Jaqueline Brito, Andre M. Ribeiro-dos-Santos, Can Firtina, Nour Almadhoun, Varuni Sarwal, Eleazar Eskin, Qiyang Hu, Derek Strong, Byoung-Do, Kim, Malak S. Abedalthagafi, Onur Mutlu, Serghei Mangul

Abstract: Omics software tools have reshaped the landscape of modern biology and become an essential component of biomedical research. The increasing dependence of biomedical scientists on these powerful tools creates a need for easier installation and greater usability. Packaging, virtualization, and containerization are different approaches to satisfy this need by wrap** omics tools in additional softwa… ▽ More Omics software tools have reshaped the landscape of modern biology and become an essential component of biomedical research. The increasing dependence of biomedical scientists on these powerful tools creates a need for easier installation and greater usability. Packaging, virtualization, and containerization are different approaches to satisfy this need by wrap** omics tools in additional software that makes the omics tools easier to install and use. Here, we systematically review practices across prominent packaging, virtualization, and containerization platforms. We outline the challenges, advantages, and limitations of each approach and some of the most widely used platforms from the perspectives of users, software developers, and system administrators. We also propose principles to make packaging, virtualization, and containerization of omics software more sustainable and robust to increase the reproducibility of biomedical and life science research. △ Less

Submitted 30 March, 2022; originally announced March 2022.

arXiv:2202.12183 [pdf, other]

Large-scale Stochastic Optimization of NDCG Surrogates for Deep Learning with Provable Convergence

Authors: Zi-Hao Qiu, Quanqi Hu, Yongjian Zhong, Lijun Zhang, Tianbao Yang

Abstract: NDCG, namely Normalized Discounted Cumulative Gain, is a widely used ranking metric in information retrieval and machine learning. However, efficient and provable stochastic methods for maximizing NDCG are still lacking, especially for deep models. In this paper, we propose a principled approach to optimize NDCG and its top-$K$ variant. First, we formulate a novel compositional optimization proble… ▽ More NDCG, namely Normalized Discounted Cumulative Gain, is a widely used ranking metric in information retrieval and machine learning. However, efficient and provable stochastic methods for maximizing NDCG are still lacking, especially for deep models. In this paper, we propose a principled approach to optimize NDCG and its top-$K$ variant. First, we formulate a novel compositional optimization problem for optimizing the NDCG surrogate, and a novel bilevel compositional optimization problem for optimizing the top-$K$ NDCG surrogate. Then, we develop efficient stochastic algorithms with provable convergence guarantees for the non-convex objectives. Different from existing NDCG optimization methods, the per-iteration complexity of our algorithms scales with the mini-batch size instead of the number of total items. To improve the effectiveness for deep learning, we further propose practical strategies by using initial warm-up and stop gradient operator. Experimental results on multiple datasets demonstrate that our methods outperform prior ranking approaches in terms of NDCG. To the best of our knowledge, this is the first time that stochastic algorithms are proposed to optimize NDCG with a provable convergence guarantee. Our proposed methods are implemented in the LibAUC library at https://libauc.org/. △ Less

Submitted 2 February, 2023; v1 submitted 24 February, 2022; originally announced February 2022.

Comments: 32 pages, 12 figures; Accepted by ICML2022

arXiv:2112.13356 [pdf, other]

Transfer Learning in High-dimensional Semi-parametric Graphical Models with Application to Brain Connectivity Analysis

Authors: Yong He, Qiushi Li, Qinqin Hu, Lei Liu

Abstract: Transfer learning has drawn growing attention with the target of improving statistical efficiency of one study (dataset) by digging information from similar and related auxiliary studies (datasets). In the article, we consider transfer learning problem in estimating undirected semi-parametric graphical model. We propose an algorithm called Trans-Copula-CLIME for estimating undirected graphical mod… ▽ More Transfer learning has drawn growing attention with the target of improving statistical efficiency of one study (dataset) by digging information from similar and related auxiliary studies (datasets). In the article, we consider transfer learning problem in estimating undirected semi-parametric graphical model. We propose an algorithm called Trans-Copula-CLIME for estimating undirected graphical model while digging information from similar auxiliary studies, characterizing the similarity between the target graph and each auxiliary graph by the sparsity of a divergence matrix. The proposed method relaxes the restrictive assumption that data follows a Gaussian distribution, which deviates from reality for the fMRI dataset related to Attention Deficit Hyperactivity Disorder (ADHD) considered here. Nonparametric rank-based correlation coefficient estimators are utilized in the Trans-Copula-CLIME procedure to achieve robustness against normality. We establish the convergence rate of the Trans-Copula-CLIME estimator under some mild conditions, which demonstrates that when the similarity between the auxiliary studies and the target study is sufficiently high and the number of informative auxiliary samples is sufficiently large, then the Trans-Copula-CLIME estimator shows great advantage over the existing non-transfer-learning ones. Simulation studies also show that Trans-Copula-CLIME estimator has better performance especially when data are not from Gaussian distribution. At last, the proposed method is applied to infer functional brain connectivity pattern for ADHD patients in the target Bei**g site by leveraging the fMRI datasets from New York site. △ Less

Submitted 26 December, 2021; originally announced December 2021.

arXiv:2112.10151 [pdf, ps, other]

doi 10.1214/24-AOS2365

Edge differentially private estimation in the $β$-model via jittering and method of moments

Authors: **yuan Chang, Qiao Hu, Eric D. Kolaczyk, Qiwei Yao, Fengting Yi

Abstract: A standing challenge in data privacy is the trade-off between the level of privacy and the efficiency of statistical inference. Here we conduct an in-depth study of this trade-off for parameter estimation in the $β$-model (Chatterjee, Diaconis and Sly, 2011) for edge differentially private network data released via jittering (Karwa, Krivitsky and Slavković, 2017). Unlike most previous approaches b… ▽ More A standing challenge in data privacy is the trade-off between the level of privacy and the efficiency of statistical inference. Here we conduct an in-depth study of this trade-off for parameter estimation in the $β$-model (Chatterjee, Diaconis and Sly, 2011) for edge differentially private network data released via jittering (Karwa, Krivitsky and Slavković, 2017). Unlike most previous approaches based on maximum likelihood estimation for this network model, we proceed via method-of-moments. This choice facilitates our exploration of a substantially broader range of privacy levels - corresponding to stricter privacy - than has been to date. Over this new range we discover our proposed estimator for the parameters exhibits an interesting phase transition, with both its convergence rate and asymptotic variance following one of three different regimes of behavior depending on the level of privacy. Because identification of the operable regime is difficult if not impossible in practice, we devise a novel adaptive bootstrap procedure to construct uniform inference across different phases. In fact, leveraging this bootstrap we are able to provide for simultaneous inference of all parameters in the $β$-model (i.e., equal to the number of nodes), which, to our best knowledge, is the first result of its kind. Numerical experiments confirm the competitive and reliable finite sample performance of the proposed inference methods, next to a comparable maximum likelihood method, as well as significant advantages in terms of computational speed and memory. △ Less

Submitted 2 April, 2024; v1 submitted 19 December, 2021; originally announced December 2021.

Journal ref: Annals of Statistics 2024, Vol. 52, pp. 708-728

arXiv:2007.06240 [pdf, other]

Expert Training: Task Hardness Aware Meta-Learning for Few-Shot Classification

Authors: Yucan Zhou, Yu Wang, Jianfei Cai, Yu Zhou, Qinghua Hu, Wei** Wang

Abstract: Deep neural networks are highly effective when a large number of labeled samples are available but fail with few-shot classification tasks. Recently, meta-learning methods have received much attention, which train a meta-learner on massive additional tasks to gain the knowledge to instruct the few-shot classification. Usually, the training tasks are randomly sampled and performed indiscriminately,… ▽ More Deep neural networks are highly effective when a large number of labeled samples are available but fail with few-shot classification tasks. Recently, meta-learning methods have received much attention, which train a meta-learner on massive additional tasks to gain the knowledge to instruct the few-shot classification. Usually, the training tasks are randomly sampled and performed indiscriminately, often making the meta-learner stuck into a bad local optimum. Some works in the optimization of deep neural networks have shown that a better arrangement of training data can make the classifier converge faster and perform better. Inspired by this idea, we propose an easy-to-hard expert meta-training strategy to arrange the training tasks properly, where easy tasks are preferred in the first phase, then, hard tasks are emphasized in the second phase. A task hardness aware module is designed and integrated into the training procedure to estimate the hardness of a task based on the distinguishability of its categories. In addition, we explore multiple hardness measurements including the semantic relation, the pairwise Euclidean distance, the Hausdorff distance, and the Hilbert-Schmidt independence criterion. Experimental results on the miniImageNet and tieredImageNetSketch datasets show that the meta-learners can obtain better results with our expert training strategy. △ Less

Submitted 13 July, 2020; originally announced July 2020.

Comments: 9 pages, 6 figures

arXiv:2006.15284 [pdf, other]

Stochastic Batch Augmentation with An Effective Distilled Dynamic Soft Label Regularizer

Authors: Qian Li, Qingyuan Hu, Yong Qi, Saiyu Qi, Jie Ma, Jian Zhang

Abstract: Data augmentation have been intensively used in training deep neural network to improve the generalization, whether in original space (e.g., image space) or representation space. Although being successful, the connection between the synthesized data and the original data is largely ignored in training, without considering the distribution information that the synthesized samples are surrounding th… ▽ More Data augmentation have been intensively used in training deep neural network to improve the generalization, whether in original space (e.g., image space) or representation space. Although being successful, the connection between the synthesized data and the original data is largely ignored in training, without considering the distribution information that the synthesized samples are surrounding the original sample in training. Hence, the behavior of the network is not optimized for this. However, that behavior is crucially important for generalization, even in the adversarial setting, for the safety of the deep learning system. In this work, we propose a framework called Stochastic Batch Augmentation (SBA) to address these problems. SBA stochastically decides whether to augment at iterations controlled by the batch scheduler and in which a ''distilled'' dynamic soft label regularization is introduced by incorporating the similarity in the vicinity distribution respect to raw samples. The proposed regularization provides direct supervision by the KL-Divergence between the output soft-max distributions of original and virtual data. Our experiments on CIFAR-10, CIFAR-100, and ImageNet show that SBA can improve the generalization of the neural networks and speed up the convergence of network training. △ Less

Submitted 27 June, 2020; originally announced June 2020.

Comments: Accepted by IJCAI 2020. SOLE copyright holder is IJCAI (international Joint Conferences on Artificial Intelligence)

arXiv:2006.13044 [pdf, other]

Scheduling Policy and Power Allocation for Federated Learning in NOMA Based MEC

Authors: Xiang Ma, Haijian Sun, Rose Qingyang Hu

Abstract: Federated learning (FL) is a highly pursued machine learning technique that can train a model centrally while kee** data distributed. Distributed computation makes FL attractive for bandwidth limited applications especially in wireless communications. There can be a large number of distributed edge devices connected to a central parameter server (PS) and iteratively download/upload data from/to… ▽ More Federated learning (FL) is a highly pursued machine learning technique that can train a model centrally while kee** data distributed. Distributed computation makes FL attractive for bandwidth limited applications especially in wireless communications. There can be a large number of distributed edge devices connected to a central parameter server (PS) and iteratively download/upload data from/to the PS. Due to the limited bandwidth, only a subset of connected devices can be scheduled in each round. There are usually millions of parameters in the state-of-art machine learning models such as deep learning, resulting in a high computation complexity as well as a high communication burden on collecting/distributing data for training. To improve communication efficiency and make the training model converge faster, we propose a new scheduling policy and power allocation scheme using non-orthogonal multiple access (NOMA) settings to maximize the weighted sum data rate under practical constraints during the entire learning process. NOMA allows multiple users to transmit on the same channel simultaneously. The user scheduling problem is transformed into a maximum-weight independent set problem that can be solved using graph theory. Simulation results show that the proposed scheduling and power allocation scheme can help achieve a higher FL testing accuracy in NOMA based wireless networks than other existing schemes. △ Less

Submitted 21 June, 2020; originally announced June 2020.

arXiv:2001.06576 [pdf, other]

Inference for Network Structure and Dynamics from Time Series Data via Graph Neural Network

Authors: Mengyuan Chen, Jiang Zhang, Zhang Zhang, Lun Du, Qiao Hu, Shuo Wang, Jiaqi Zhu

Abstract: Network structures in various backgrounds play important roles in social, technological, and biological systems. However, the observable network structures in real cases are often incomplete or unavailable due to measurement errors or private protection issues. Therefore, inferring the complete network structure is useful for understanding complex systems. The existing studies have not fully solve… ▽ More Network structures in various backgrounds play important roles in social, technological, and biological systems. However, the observable network structures in real cases are often incomplete or unavailable due to measurement errors or private protection issues. Therefore, inferring the complete network structure is useful for understanding complex systems. The existing studies have not fully solved the problem of inferring network structure with partial or no information about connections or nodes. In this paper, we tackle the problem by utilizing time series data generated by network dynamics. We regard the network inference problem based on dynamical time series data as a problem of minimizing errors for predicting future states and proposed a novel data-driven deep learning model called Gumbel Graph Network (GGN) to solve the two kinds of network inference problems: Network Reconstruction and Network Completion. For the network reconstruction problem, the GGN framework includes two modules: the dynamics learner and the network generator. For the network completion problem, GGN adds a new module called the States Learner to infer missing parts of the network. We carried out experiments on discrete and continuous time series data. The experiments show that our method can reconstruct up to 100% network structure on the network reconstruction task. While the model can also infer the unknown parts of the structure with up to 90% accuracy when some nodes are missing. And the accuracy decays with the increase of the fractions of missing nodes. Our framework may have wide application areas where the network structure is hard to obtained and the time series data is rich. △ Less

Submitted 17 January, 2020; originally announced January 2020.

arXiv:1906.03768 [pdf, ps, other]

A cost-reducing partial labeling estimator in text classification problem

Authors: Jiangning Chen, Zhibo Dai, Juntao Duan, Qianli Hu, Ruilin Li, Heinrich Matzinger, Ionel Popescu, Haoyan Zhai

Abstract: We propose a new approach to address the text classification problems when learning with partial labels is beneficial. Instead of offering each training sample a set of candidate labels, we assign negative-oriented labels to the ambiguous training examples if they are unlikely fall into certain classes. We construct our new maximum likelihood estimators with self-correction property, and prove tha… ▽ More We propose a new approach to address the text classification problems when learning with partial labels is beneficial. Instead of offering each training sample a set of candidate labels, we assign negative-oriented labels to the ambiguous training examples if they are unlikely fall into certain classes. We construct our new maximum likelihood estimators with self-correction property, and prove that under some conditions, our estimators converge faster. Also we discuss the advantages of applying one of our estimator to a fully supervised learning problem. The proposed method has potential applicability in many areas, such as crowdsourcing, natural language processing and medical image analysis. △ Less

Submitted 9 June, 2019; originally announced June 2019.

arXiv:1812.08217 [pdf, ps, other]

doi 10.1016/j.jeconom.2022.06.010

Optimal covariance matrix estimation for high-dimensional noise in high-frequency data

Authors: **yuan Chang, Qiao Hu, Cheng Liu, Cheng Yong Tang

Abstract: We consider high-dimensional measurement errors with high-frequency data. Our objective is on recovering the high-dimensional cross-sectional covariance matrix of the random errors with optimality. In this problem, not all components of the random vector are observed at the same time and the measurement errors are latent variables, leading to major challenges besides high data dimensionality. We p… ▽ More We consider high-dimensional measurement errors with high-frequency data. Our objective is on recovering the high-dimensional cross-sectional covariance matrix of the random errors with optimality. In this problem, not all components of the random vector are observed at the same time and the measurement errors are latent variables, leading to major challenges besides high data dimensionality. We propose a new covariance matrix estimator in this context with appropriate localization and thresholding, and then conduct a series of comprehensive theoretical investigations of the proposed estimator. By develo** a new technical device integrating the high-frequency data feature with the conventional notion of $α$-mixing, our analysis successfully accommodates the challenging serial dependence in the measurement errors. Our theoretical analysis establishes the minimax optimal convergence rates associated with two commonly used loss functions; and we demonstrate with concrete cases when the proposed localized estimator with thresholding achieves the minimax optimal convergence rates. Considering that the variances and covariances can be small in reality, we conduct a second-order theoretical analysis that further disentangles the dominating bias in the estimator. A bias-corrected estimator is then proposed to ensure its practical finite sample performance. We also extensively analyze our estimator in the setting with jumps, and show that its performance is reasonably robust. We illustrate the promising empirical performance of the proposed estimator with extensive simulation studies and a real data analysis. △ Less

Submitted 10 September, 2022; v1 submitted 19 December, 2018; originally announced December 2018.

Journal ref: Journal of Econometrics 2024, Vol. 239, 105329

arXiv:1810.13192 [pdf, ps, other]

Nearly-tight bounds on linear regions of piecewise linear neural networks

Authors: Qiang Hu, Hao Zhang

Abstract: The developments of deep neural networks (DNN) in recent years have ushered a brand new era of artificial intelligence. DNNs are proved to be excellent in solving very complex problems, e.g., visual recognition and text understanding, to the extent of competing with or even surpassing people. Despite inspiring and encouraging success of DNNs, thorough theoretical analyses still lack to unravel the… ▽ More The developments of deep neural networks (DNN) in recent years have ushered a brand new era of artificial intelligence. DNNs are proved to be excellent in solving very complex problems, e.g., visual recognition and text understanding, to the extent of competing with or even surpassing people. Despite inspiring and encouraging success of DNNs, thorough theoretical analyses still lack to unravel the mystery of their magics. The design of DNN structure is dominated by empirical results in terms of network depth, number of neurons and activations. A few of remarkable works published recently in an attempt to interpret DNNs have established the first glimpses of their internal mechanisms. Nevertheless, research on exploring how DNNs operate is still at the initial stage with plenty of room for refinement. In this paper, we extend precedent research on neural networks with piecewise linear activations (PLNN) concerning linear regions bounds. We present (i) the exact maximal number of linear regions for single layer PLNNs; (ii) a upper bound for multi-layer PLNNs; and (iii) a tighter upper bound for the maximal number of liner regions on rectifier networks. The derived bounds also indirectly explain why deep models are more powerful than shallow counterparts, and how non-linearity of activation functions impacts on expressiveness of networks. △ Less

Submitted 26 December, 2018; v1 submitted 31 October, 2018; originally announced October 2018.

Comments: Counting linear regions of neural networks

arXiv:1806.00580 [pdf, other]

Detecting Adversarial Examples via Key-based Network

Authors: Pinlong Zhao, Zhouyu Fu, Ou wu, Qinghua Hu, Jun Wang

Abstract: Though deep neural networks have achieved state-of-the-art performance in visual classification, recent studies have shown that they are all vulnerable to the attack of adversarial examples. Small and often imperceptible perturbations to the input images are sufficient to fool the most powerful deep neural networks. Various defense methods have been proposed to address this issue. However, they ei… ▽ More Though deep neural networks have achieved state-of-the-art performance in visual classification, recent studies have shown that they are all vulnerable to the attack of adversarial examples. Small and often imperceptible perturbations to the input images are sufficient to fool the most powerful deep neural networks. Various defense methods have been proposed to address this issue. However, they either require knowledge on the process of generating adversarial examples, or are not robust against new attacks specifically designed to penetrate the existing defense. In this work, we introduce key-based network, a new detection-based defense mechanism to distinguish adversarial examples from normal ones based on error correcting output codes, using the binary code vectors produced by multiple binary classifiers applied to randomly chosen label-sets as signatures to match normal images and reject adversarial examples. In contrast to existing defense methods, the proposed method does not require knowledge of the process for generating adversarial examples and can be applied to defend against different types of attacks. For the practical black-box and gray-box scenarios, where the attacker does not know the encoding scheme, we show empirically that key-based network can effectively detect adversarial examples generated by several state-of-the-art attacks. △ Less

Submitted 2 June, 2018; originally announced June 2018.

Comments: 6 pages

Showing 1–23 of 23 results for author: Hu, Q