Search | arXiv e-print repository

UIFV: Data Reconstruction Attack in Vertical Federated Learning

Authors: Jirui Yang, Peng Chen, Zhihui Lu, Qiang Duan, Yubing Bao

Abstract: Vertical Federated Learning (VFL) facilitates collaborative machine learning without the need for participants to share raw private data. However, recent studies have revealed privacy risks where adversaries might reconstruct sensitive features through data leakage during the learning process. Although data reconstruction methods based on gradient or model information are somewhat effective, they… ▽ More Vertical Federated Learning (VFL) facilitates collaborative machine learning without the need for participants to share raw private data. However, recent studies have revealed privacy risks where adversaries might reconstruct sensitive features through data leakage during the learning process. Although data reconstruction methods based on gradient or model information are somewhat effective, they reveal limitations in VFL application scenarios. This is because these traditional methods heavily rely on specific model structures and/or have strict limitations on application scenarios. To address this, our study introduces the Unified InverNet Framework into VFL, which yields a novel and flexible approach (dubbed UIFV) that leverages intermediate feature data to reconstruct original data, instead of relying on gradients or model details. The intermediate feature data is the feature exchanged by different participants during the inference phase of VFL. Experiments on four datasets demonstrate that our methods significantly outperform state-of-the-art techniques in attack precision. Our work exposes severe privacy vulnerabilities within VFL systems that pose real threats to practical VFL applications and thus confirms the necessity of further enhancing privacy protection in the VFL architecture. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.01799 [pdf, other]

Online Control in Population Dynamics

Authors: Noah Golowich, Elad Hazan, Zhou Lu, Dhruv Rohatgi, Y. Jennifer Sun

Abstract: The study of population dynamics originated with early sociological works but has since extended into many fields, including biology, epidemiology, evolutionary game theory, and economics. Most studies on population dynamics focus on the problem of prediction rather than control. Existing mathematical models for control in population dynamics are often restricted to specific, noise-free dynamics,… ▽ More The study of population dynamics originated with early sociological works but has since extended into many fields, including biology, epidemiology, evolutionary game theory, and economics. Most studies on population dynamics focus on the problem of prediction rather than control. Existing mathematical models for control in population dynamics are often restricted to specific, noise-free dynamics, while real-world population changes can be complex and adversarial. To address this gap, we propose a new framework based on the paradigm of online control. We first characterize a set of linear dynamical systems that can naturally model evolving populations. We then give an efficient gradient-based controller for these systems, with near-optimal regret bounds with respect to a broad class of linear policies. Our empirical evaluations demonstrate the effectiveness of the proposed algorithm for control in population dynamics even for non-linear models such as SIR and replicator dynamics. △ Less

Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.18577 [pdf, other]

Single-loop Stochastic Algorithms for Difference of Max-Structured Weakly Convex Functions

Authors: Quanqi Hu, Qi Qi, Zhaosong Lu, Tianbao Yang

Abstract: In this paper, we study a class of non-smooth non-convex problems in the form of $\min_{x}[\max_{y\in Y}φ(x, y) - \max_{z\in Z}ψ(x, z)]$, where both $Φ(x) = \max_{y\in Y}φ(x, y)$ and $Ψ(x)=\max_{z\in Z}ψ(x, z)$ are weakly convex functions, and $φ(x, y), ψ(x, z)$ are strongly concave functions in terms of $y$ and $z$, respectively. It covers two families of problems that have been studied but are m… ▽ More In this paper, we study a class of non-smooth non-convex problems in the form of $\min_{x}[\max_{y\in Y}φ(x, y) - \max_{z\in Z}ψ(x, z)]$, where both $Φ(x) = \max_{y\in Y}φ(x, y)$ and $Ψ(x)=\max_{z\in Z}ψ(x, z)$ are weakly convex functions, and $φ(x, y), ψ(x, z)$ are strongly concave functions in terms of $y$ and $z$, respectively. It covers two families of problems that have been studied but are missing single-loop stochastic algorithms, i.e., difference of weakly convex functions and weakly convex strongly-concave min-max problems. We propose a stochastic Moreau envelope approximate gradient method dubbed SMAG, the first single-loop algorithm for solving these problems, and provide a state-of-the-art non-asymptotic convergence rate. The key idea of the design is to compute an approximate gradient of the Moreau envelopes of $Φ, Ψ$ using only one step of stochastic gradient update of the primal and dual variables. Empirically, we conduct experiments on positive-unlabeled (PU) learning and partial area under ROC curve (pAUC) optimization with an adversarial fairness regularizer to validate the effectiveness of our proposed algorithms. △ Less

Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

arXiv:2404.13177 [pdf, other]

A Bayesian Hybrid Design with Borrowing from Historical Study

Authors: Zhaohua Lu, John Toso, Girma Ayele, Philip He

Abstract: In early phase drug development of combination therapy, the primary objective is to preliminarily assess whether there is additive activity when a novel agent combined with an established monotherapy. Due to potential feasibility issues with a large randomized study, uncontrolled single-arm trials have been the mainstream approach in cancer clinical trials. However, such trials often present signi… ▽ More In early phase drug development of combination therapy, the primary objective is to preliminarily assess whether there is additive activity when a novel agent combined with an established monotherapy. Due to potential feasibility issues with a large randomized study, uncontrolled single-arm trials have been the mainstream approach in cancer clinical trials. However, such trials often present significant challenges in deciding whether to proceed to the next phase of development. A hybrid design, leveraging data from a completed historical clinical study of the monotherapy, offers a valuable option to enhance study efficiency and improve informed decision-making. Compared to traditional single-arm designs, the hybrid design may significantly enhance power by borrowing external information, enabling a more robust assessment of activity. The primary challenge of hybrid design lies in handling information borrowing. We introduce a Bayesian dynamic power prior (DPP) framework with three components of controlling amount of dynamic borrowing. The framework offers flexible study design options with explicit interpretation of borrowing, allowing customization according to specific needs. Furthermore, the posterior distribution in the proposed framework has a closed form, offering significant advantages in computational efficiency. The proposed framework's utility is demonstrated through simulations and a case study. △ Less

Submitted 29 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

arXiv:2402.02701 [pdf, other]

Understanding What Affects Generalization Gap in Visual Reinforcement Learning: Theory and Empirical Evidence

Authors: Jiafei Lyu, Le Wan, Xiu Li, Zongqing Lu

Abstract: Recently, there are many efforts attempting to learn useful policies for continuous control in visual reinforcement learning (RL). In this scenario, it is important to learn a generalizable policy, as the testing environment may differ from the training environment, e.g., there exist distractors during deployment. Many practical algorithms are proposed to handle this problem. However, to the best… ▽ More Recently, there are many efforts attempting to learn useful policies for continuous control in visual reinforcement learning (RL). In this scenario, it is important to learn a generalizable policy, as the testing environment may differ from the training environment, e.g., there exist distractors during deployment. Many practical algorithms are proposed to handle this problem. However, to the best of our knowledge, none of them provide a theoretical understanding of what affects the generalization gap and why their proposed methods work. In this paper, we bridge this issue by theoretically answering the key factors that contribute to the generalization gap when the testing environment has distractors. Our theories indicate that minimizing the representation distance between training and testing environments, which aligns with human intuition, is the most critical for the benefit of reducing the generalization gap. Our theoretical results are supported by the empirical evidence in the DMControl Generalization Benchmark (DMC-GB). △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: Part of this work is accepted as AAMAS 2024 extended abstract

arXiv:2401.06904 [pdf]

Non-collapsibility and Built-in Selection Bias of Hazard Ratio in Randomized Controlled Trials

Authors: Helen Bian, Menglan Pang, Guanbo Wang, Zihang Lu

Abstract: Background: The hazard ratio of the Cox proportional hazards model is widely used in randomized controlled trials to assess treatment effects. However, two properties of the hazard ratio including the non-collapsibility and built-in selection bias need to be further investigated. Methods: We conduct simulations to differentiate the non-collapsibility effect and built-in selection bias from the dif… ▽ More Background: The hazard ratio of the Cox proportional hazards model is widely used in randomized controlled trials to assess treatment effects. However, two properties of the hazard ratio including the non-collapsibility and built-in selection bias need to be further investigated. Methods: We conduct simulations to differentiate the non-collapsibility effect and built-in selection bias from the difference between the marginal and the conditional hazard ratio. Meanwhile, we explore the performance of the Cox model with inverse probability of treatment weighting for covariate adjustment when estimating the marginal hazard ratio. The built-in selection bias is further assessed in the period-specific hazard ratio. Results: The conditional hazard ratio is a biased estimate of the marginal effect due to the non-collapsibility property. In contrast, the hazard ratio estimated from the inverse probability of treatment weighting Cox model provides an unbiased estimate of the true marginal hazard ratio. The built-in selection bias only manifests in the period-specific hazard ratios even when the proportional hazards assumption is satisfied. The Cox model with inverse probability of treatment weighting can be used to account for confounding bias and provide an unbiased effect under the randomized controlled trials setting when the parameter of interest is the marginal effect. Conclusions: We propose that the period-specific hazard ratios should always be avoided due to the profound effects of built-in selection bias. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: 17 pages, 2 figures

arXiv:2311.14655 [pdf, other]

A Sparse Factor Model for Clustering High-Dimensional Longitudinal Data

Authors: Zihang Lu, Noirrit Kiran Chandra

Abstract: Recent advances in engineering technologies have enabled the collection of a large number of longitudinal features. This wealth of information presents unique opportunities for researchers to investigate the complex nature of diseases and uncover underlying disease mechanisms. However, analyzing such kind of data can be difficult due to its high dimensionality, heterogeneity and computational chal… ▽ More Recent advances in engineering technologies have enabled the collection of a large number of longitudinal features. This wealth of information presents unique opportunities for researchers to investigate the complex nature of diseases and uncover underlying disease mechanisms. However, analyzing such kind of data can be difficult due to its high dimensionality, heterogeneity and computational challenges. In this paper, we propose a Bayesian nonparametric mixture model for clustering high-dimensional mixed-type (e.g., continuous, discrete and categorical) longitudinal features. We employ a sparse factor model on the joint distribution of random effects and the key idea is to induce clustering at the latent factor level instead of the original data to escape the curse of dimensionality. The number of clusters is estimated through a Dirichlet process prior. An efficient Gibbs sampler is developed to estimate the posterior distribution of the model parameters. Analysis of real and simulated data is presented and discussed. Our study demonstrates that the proposed model serves as a useful analytical tool for clustering high-dimensional longitudinal data. △ Less

Submitted 24 November, 2023; originally announced November 2023.

arXiv:2311.06928 [pdf, other]

Attention for Causal Relationship Discovery from Biological Neural Dynamics

Authors: Ziyu Lu, Anika Tabassum, Shruti Kulkarni, Lu Mi, J. Nathan Kutz, Eric Shea-Brown, Seung-Hwan Lim

Abstract: This paper explores the potential of the transformer models for learning Granger causality in networks with complex nonlinear dynamics at every node, as in neurobiological and biophysical networks. Our study primarily focuses on a proof-of-concept investigation based on simulated neural dynamics, for which the ground-truth causality is known through the underlying connectivity matrix. For transfor… ▽ More This paper explores the potential of the transformer models for learning Granger causality in networks with complex nonlinear dynamics at every node, as in neurobiological and biophysical networks. Our study primarily focuses on a proof-of-concept investigation based on simulated neural dynamics, for which the ground-truth causality is known through the underlying connectivity matrix. For transformer models trained to forecast neuronal population dynamics, we show that the cross attention module effectively captures the causal relationship among neurons, with an accuracy equal or superior to that for the most popular Granger causality analysis method. While we acknowledge that real-world neurobiology data will bring further challenges, including dynamic connectivity and unobserved variability, this research offers an encouraging preliminary glimpse into the utility of the transformer model for causal representation learning in neuroscience. △ Less

Submitted 23 November, 2023; v1 submitted 12 November, 2023; originally announced November 2023.

Comments: Accepted to the NeurIPS 2023 Workshop on Causal Representation Learning

arXiv:2309.16578 [pdf, other]

doi 10.1038/s43588-024-00605-8

Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning

Authors: He Zhang, Siyuan Liu, Jiacheng You, Chang Liu, Shuxin Zheng, Ziheng Lu, Tong Wang, Nanning Zheng, Bin Shao

Abstract: Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research. However, its accuracy is limited by the kinetic energy density functional, which is notoriously hard to approximate for non-periodic molecular systems. Here we propose M-OFDFT, an OFDFT… ▽ More Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research. However, its accuracy is limited by the kinetic energy density functional, which is notoriously hard to approximate for non-periodic molecular systems. Here we propose M-OFDFT, an OFDFT approach capable of solving molecular systems using a deep learning functional model. We build the essential non-locality into the model, which is made affordable by the concise density representation as expansion coefficients under an atomic basis. With techniques to address unconventional learning challenges therein, M-OFDFT achieves a comparable accuracy with Kohn-Sham DFT on a wide range of molecules untouched by OFDFT before. More attractively, M-OFDFT extrapolates well to molecules much larger than those seen in training, which unleashes the appealing scaling of OFDFT for studying large molecules including proteins, representing an advancement of the accuracy-efficiency trade-off frontier in quantum chemistry. △ Less

Submitted 9 March, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

Comments: Published in Nature Computational Science, March 2024. Full paper with supplementary information

arXiv:2305.10187 [pdf, other]

Evaluating Dynamic Conditional Quantile Treatment Effects with Applications in Ridesharing

Authors: Ting Li, Chengchun Shi, Zhaohua Lu, Yi Li, Hongtu Zhu

Abstract: Many modern tech companies, such as Google, Uber, and Didi, utilize online experiments (also known as A/B testing) to evaluate new policies against existing ones. While most studies concentrate on average treatment effects, situations with skewed and heavy-tailed outcome distributions may benefit from alternative criteria, such as quantiles. However, assessing dynamic quantile treatment effects (Q… ▽ More Many modern tech companies, such as Google, Uber, and Didi, utilize online experiments (also known as A/B testing) to evaluate new policies against existing ones. While most studies concentrate on average treatment effects, situations with skewed and heavy-tailed outcome distributions may benefit from alternative criteria, such as quantiles. However, assessing dynamic quantile treatment effects (QTE) remains a challenge, particularly when dealing with data from ride-sourcing platforms that involve sequential decision-making across time and space. In this paper, we establish a formal framework to calculate QTE conditional on characteristics independent of the treatment. Under specific model assumptions, we demonstrate that the dynamic conditional QTE (CQTE) equals the sum of individual CQTEs across time, even though the conditional quantile of cumulative rewards may not necessarily equate to the sum of conditional quantiles of individual rewards. This crucial insight significantly streamlines the estimation and inference processes for our target causal estimand. We then introduce two varying coefficient decision process (VCDP) models and devise an innovative method to test the dynamic CQTE. Moreover, we expand our approach to accommodate data from spatiotemporal dependent experiments and examine both conditional quantile direct and indirect effects. To showcase the practical utility of our method, we apply it to three real-world datasets from a ride-sourcing platform. Theoretical findings and comprehensive simulation studies further substantiate our proposal. △ Less

Submitted 17 May, 2023; originally announced May 2023.

arXiv:2302.11032 [pdf, other]

Boosting Nyström Method

Authors: Keaton Hamm, Zhaoying Lu, Wenbo Ouyang, Hao Helen Zhang

Abstract: The Nyström method is an effective tool to generate low-rank approximations of large matrices, and it is particularly useful for kernel-based learning. To improve the standard Nyström approximation, ensemble Nyström algorithms compute a mixture of Nyström approximations which are generated independently based on column resampling. We propose a new family of algorithms, boosting Nyström, which iter… ▽ More The Nyström method is an effective tool to generate low-rank approximations of large matrices, and it is particularly useful for kernel-based learning. To improve the standard Nyström approximation, ensemble Nyström algorithms compute a mixture of Nyström approximations which are generated independently based on column resampling. We propose a new family of algorithms, boosting Nyström, which iteratively generate multiple ``weak'' Nyström approximations (each using a small number of columns) in a sequence adaptively - each approximation aims to compensate for the weaknesses of its predecessor - and then combine them to form one strong approximation. We demonstrate that our boosting Nyström algorithms can yield more efficient and accurate low-rank approximations to kernel matrices. Improvements over the standard and ensemble Nyström methods are illustrated by simulation studies and real-world data analysis. △ Less

Submitted 21 February, 2023; originally announced February 2023.

arXiv:2301.04204 [pdf, ps, other]

A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization

Authors: Chuan He, Heng Huang, Zhaosong Lu

Abstract: In this paper we consider finding an approximate second-order stationary point (SOSP) of general nonconvex conic optimization that minimizes a twice differentiable function subject to nonlinear equality constraints and also a convex conic constraint. In particular, we propose a Newton-conjugate gradient (Newton-CG) based barrier-augmented Lagrangian method for finding an approximate SOSP of this p… ▽ More In this paper we consider finding an approximate second-order stationary point (SOSP) of general nonconvex conic optimization that minimizes a twice differentiable function subject to nonlinear equality constraints and also a convex conic constraint. In particular, we propose a Newton-conjugate gradient (Newton-CG) based barrier-augmented Lagrangian method for finding an approximate SOSP of this problem. Under some mild assumptions, we show that our method enjoys a total inner iteration complexity of $\widetilde{\cal O}(ε^{-11/2})$ and an operation complexity of $\widetilde{\cal O}(ε^{-11/2}\min\{n,ε^{-5/4}\})$ for finding an $(ε,\sqrtε)$-SOSP of general nonconvex conic optimization with high probability. Moreover, under a constraint qualification, these complexity bounds are improved to $\widetilde{\cal O}(ε^{-7/2})$ and $\widetilde{\cal O}(ε^{-7/2}\min\{n,ε^{-3/4}\})$, respectively. To the best of our knowledge, this is the first study on the complexity of finding an approximate SOSP of general nonconvex conic optimization. Preliminary numerical results are presented to demonstrate superiority of the proposed method over first-order methods in terms of solution quality. △ Less

Submitted 10 January, 2023; originally announced January 2023.

Comments: 34 pages. arXiv admin note: substantial text overlap with arXiv:2301.03139

MSC Class: 49M05; 49M15; 68Q25; 90C26; 90C30; 90C60

arXiv:2301.03139 [pdf, ps, other]

A Newton-CG based augmented Lagrangian method for finding a second-order stationary point of nonconvex equality constrained optimization with complexity guarantees

Authors: Chuan He, Zhaosong Lu, Ting Kei Pong

Abstract: In this paper we consider finding a second-order stationary point (SOSP) of nonconvex equality constrained optimization when a nearly feasible point is known. In particular, we first propose a new Newton-CG method for finding an approximate SOSP of unconstrained optimization and show that it enjoys a substantially better complexity than the Newton-CG method [56]. We then propose a Newton-CG based… ▽ More In this paper we consider finding a second-order stationary point (SOSP) of nonconvex equality constrained optimization when a nearly feasible point is known. In particular, we first propose a new Newton-CG method for finding an approximate SOSP of unconstrained optimization and show that it enjoys a substantially better complexity than the Newton-CG method [56]. We then propose a Newton-CG based augmented Lagrangian (AL) method for finding an approximate SOSP of nonconvex equality constrained optimization, in which the proposed Newton-CG method is used as a subproblem solver. We show that under a generalized linear independence constraint qualification (GLICQ), our AL method enjoys a total inner iteration complexity of $\widetilde{\cal O}(ε^{-7/2})$ and an operation complexity of $\widetilde{\cal O}(ε^{-7/2}\min\{n,ε^{-3/4}\})$ for finding an $(ε,\sqrtε)$-SOSP of nonconvex equality constrained optimization with high probability, which are significantly better than the ones achieved by the proximal AL method [60]. Besides, we show that it has a total inner iteration complexity of $\widetilde{\cal O}(ε^{-11/2})$ and an operation complexity of $\widetilde{\cal O}(ε^{-11/2}\min\{n,ε^{-5/4}\})$ when the GLICQ does not hold. To the best of our knowledge, all the complexity results obtained in this paper are new for finding an approximate SOSP of nonconvex equality constrained optimization with high probability. Preliminary numerical results also demonstrate the superiority of our proposed methods over the ones in [56,60]. △ Less

Submitted 8 January, 2023; originally announced January 2023.

Comments: 29 pages, accepted by SIAM Journal on Optimization

MSC Class: 49M15; 68Q25; 90C06; 90C26; 90C30; 90C60

arXiv:2301.02060 [pdf, ps, other]

A first-order augmented Lagrangian method for constrained minimax optimization

Authors: Zhaosong Lu, Sanyou Mei

Abstract: In this paper we study a class of constrained minimax problems. In particular, we propose a first-order augmented Lagrangian method for solving them, whose subproblems turn out to be a much simpler structured minimax problem and are suitably solved by a first-order method recently developed in [26] by the authors. Under some suitable assumptions, an \emph{operation complexity} of… ▽ More In this paper we study a class of constrained minimax problems. In particular, we propose a first-order augmented Lagrangian method for solving them, whose subproblems turn out to be a much simpler structured minimax problem and are suitably solved by a first-order method recently developed in [26] by the authors. Under some suitable assumptions, an \emph{operation complexity} of ${\cal O}(\varepsilon^{-4}\log\varepsilon^{-1})$, measured by its fundamental operations, is established for the first-order augmented Lagrangian method for finding an $\varepsilon$-KKT solution of the constrained minimax problems. △ Less

Submitted 17 April, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

Comments: added a new section. arXiv admin note: substantial text overlap with arXiv:2301.01716

MSC Class: 90C26; 90C30; 90C47; 90C99; 65K05

arXiv:2301.01716 [pdf, ps, other]

First-order penalty methods for bilevel optimization

Authors: Zhaosong Lu, Sanyou Mei

Abstract: In this paper we study a class of unconstrained and constrained bilevel optimization problems in which the lower level is a possibly nonsmooth convex optimization problem, while the upper level is a possibly nonconvex optimization problem. We introduce a notion of $\varepsilon$-KKT solution for them and show that an $\varepsilon$-KKT solution leads to an $O(\sqrt{\varepsilon})$- or… ▽ More In this paper we study a class of unconstrained and constrained bilevel optimization problems in which the lower level is a possibly nonsmooth convex optimization problem, while the upper level is a possibly nonconvex optimization problem. We introduce a notion of $\varepsilon$-KKT solution for them and show that an $\varepsilon$-KKT solution leads to an $O(\sqrt{\varepsilon})$- or $O(\varepsilon)$-hypergradient based stionary point under suitable assumptions. We also propose first-order penalty methods for finding an $\varepsilon$-KKT solution of them, whose subproblems turn out to be a structured minimax problem and can be suitably solved by a first-order method recently developed by the authors. Under suitable assumptions, an \emph{operation complexity} of $O(\varepsilon^{-4}\log\varepsilon^{-1})$ and $O(\varepsilon^{-7}\log\varepsilon^{-1})$, measured by their fundamental operations, is established for the proposed penalty methods for finding an $\varepsilon$-KKT solution of the unconstrained and constrained bilevel optimization problems, respectively. Preliminary numerical results are presented to illustrate the performance of our proposed methods. To the best of our knowledge, this paper is the first work to demonstrate that bilevel optimization can be approximately solved as minimax optimization, and moreover, it provides the first implementable method with complexity guarantees for such sophisticated bilevel optimization. △ Less

Submitted 7 March, 2024; v1 submitted 4 January, 2023; originally announced January 2023.

Comments: Accepted by SIAM Journal on Optimization

MSC Class: 90C26; 90C30; 90C47; 90C99; 65K05

arXiv:2212.08756 [pdf, other]

Multi-Scales Data Augmentation Approach In Natural Language Inference For Artifacts Mitigation And Pre-Trained Model Optimization

Authors: Zhenyuan Lu

Abstract: Machine learning models can reach high performance on benchmark natural language processing (NLP) datasets but fail in more challenging settings. We study this issue when a pre-trained model learns dataset artifacts in natural language inference (NLI), the topic of studying the logical relationship between a pair of text sequences. We provide a variety of techniques for analyzing and locating data… ▽ More Machine learning models can reach high performance on benchmark natural language processing (NLP) datasets but fail in more challenging settings. We study this issue when a pre-trained model learns dataset artifacts in natural language inference (NLI), the topic of studying the logical relationship between a pair of text sequences. We provide a variety of techniques for analyzing and locating dataset artifacts inside the crowdsourced Stanford Natural Language Inference (SNLI) corpus. We study the stylistic pattern of dataset artifacts in the SNLI. To mitigate dataset artifacts, we employ a unique multi-scale data augmentation technique with two distinct frameworks: a behavioral testing checklist at the sentence level and lexical synonym criteria at the word level. Specifically, our combination method enhances our model's resistance to perturbation testing, enabling it to continuously outperform the pre-trained baseline. △ Less

Submitted 16 March, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

arXiv:2211.12638 [pdf, ps, other]

Projection-free Adaptive Regret with Membership Oracles

Authors: Zhou Lu, Nataly Brukhim, Paula Gradu, Elad Hazan

Abstract: In the framework of online convex optimization, most iterative algorithms require the computation of projections onto convex sets, which can be computationally expensive. To tackle this problem HK12 proposed the study of projection-free methods that replace projections with less expensive computations. The most common approach is based on the Frank-Wolfe method, that uses linear optimization compu… ▽ More In the framework of online convex optimization, most iterative algorithms require the computation of projections onto convex sets, which can be computationally expensive. To tackle this problem HK12 proposed the study of projection-free methods that replace projections with less expensive computations. The most common approach is based on the Frank-Wolfe method, that uses linear optimization computation in lieu of projections. Recent work by GK22 gave sublinear adaptive regret guarantees with projection free algorithms based on the Frank Wolfe approach. In this work we give projection-free algorithms that are based on a different technique, inspired by Mhammedi22, that replaces projections by set-membership computations. We propose a simple lazy gradient-based algorithm with a Minkowski regularization that attains near-optimal adaptive regret bounds. For general convex loss functions we improve previous adaptive regret bounds from $O(T^{3/4})$ to $O(\sqrt{T})$, and further to tight interval dependent bound $\tilde{O}(\sqrt{I})$ where $I$ denotes the interval length. For strongly convex functions we obtain the first poly-logarithmic adaptive regret bounds using a projection-free algorithm. △ Less

Submitted 14 December, 2022; v1 submitted 22 November, 2022; originally announced November 2022.

arXiv:2211.04289 [pdf]

doi 10.1371/journal.pdig.0000331

Review and Analysis of Pain Research Literature through Keyword Co-occurrence Networks

Authors: Burcu Ozek, Zhenyuan Lu, Fatemeh Pouromran, Sagar Kamarthi

Abstract: Pain is a significant public health problem as the number of individuals with a history of pain globally keeps growing. In response, many synergistic research areas have been coming together to address pain-related issues. This work conducts a review and analysis of a vast body of pain-related literature using the keyword co-occurrence network (KCN) methodology. In this method, a set of KCNs is co… ▽ More Pain is a significant public health problem as the number of individuals with a history of pain globally keeps growing. In response, many synergistic research areas have been coming together to address pain-related issues. This work conducts a review and analysis of a vast body of pain-related literature using the keyword co-occurrence network (KCN) methodology. In this method, a set of KCNs is constructed by treating keywords as nodes and the co-occurrence of keywords as links between the nodes. Since keywords represent the knowledge components of research articles, analysis of KCNs will reveal the knowledge structure and research trends in the literature. This study extracted and analyzed keywords from 264,560 pain-related research articles indexed in IEEE, PubMed, Engineering Village, and Web of Science published between 2002 and 2021. We observed rapid growth in pain literature in the last two decades: the number of articles has grown nearly threefold, and the number of keywords has grown by a factor of 7. We identified emerging and declining research trends in sensors/methods, biomedical, and treatment tracks. We also extracted the most frequently co-occurring keyword pairs and clusters to help researchers recognize the synergies among different pain-related topics. △ Less

Submitted 8 November, 2022; originally announced November 2022.

arXiv:2210.08385 [pdf, other]

A Joint Modeling Approach for Clustering Mixed-Type Multivariate Longitudinal Data: Application to the CHILD Cohort Study

Authors: Zhiwen Tan, Chang Shen, Padmaja Subbarao, Wendy Lou, Zihang Lu

Abstract: In epidemiological and clinical studies, identifying patients' phenotypes based on longitudinal profiles is critical to understanding the disease's developmental patterns. The current study was motivated by data from a Canadian birth cohort study, the CHILD Cohort Study. Our goal was to use multiple longitudinal respiratory traits to cluster the participants into subgroups with similar longitudina… ▽ More In epidemiological and clinical studies, identifying patients' phenotypes based on longitudinal profiles is critical to understanding the disease's developmental patterns. The current study was motivated by data from a Canadian birth cohort study, the CHILD Cohort Study. Our goal was to use multiple longitudinal respiratory traits to cluster the participants into subgroups with similar longitudinal respiratory profiles in order to identify clinically relevant disease phenotypes. To appropriately account for distinct structures and types of these longitudinal markers, we proposed a novel joint model for clustering mixed-type (continuous, discrete and categorical) multivariate longitudinal data. We also developed a Markov Chain Monte Carlo algorithm to estimate the posterior distribution of model parameters. Analysis of the CHILD Cohort data and simulated data were presented and discussed. Our study demonstrated that the proposed model serves as a useful analytical tool for clustering multivariate mixed-type longitudinal data. We developed an R package BCClong to implement the proposed model efficiently. △ Less

Submitted 21 March, 2023; v1 submitted 15 October, 2022; originally announced October 2022.

Comments: 21 pages, 4 figures, 2 tables

arXiv:2207.05697 [pdf, ps, other]

A Newton-CG based barrier method for finding a second-order stationary point of nonconvex conic optimization with complexity guarantees

Authors: Chuan He, Zhaosong Lu

Abstract: In this paper we consider finding an approximate second-order stationary point (SOSP) of nonconvex conic optimization that minimizes a twice differentiable function over the intersection of an affine subspace and a convex cone. In particular, we propose a Newton-conjugate gradient (Newton-CG) based barrier method for finding an $(ε,\sqrtε)$-SOSP of this problem. Our method is not only implementabl… ▽ More In this paper we consider finding an approximate second-order stationary point (SOSP) of nonconvex conic optimization that minimizes a twice differentiable function over the intersection of an affine subspace and a convex cone. In particular, we propose a Newton-conjugate gradient (Newton-CG) based barrier method for finding an $(ε,\sqrtε)$-SOSP of this problem. Our method is not only implementable, but also achieves an iteration complexity of ${\cal O}(ε^{-3/2})$, which matches the best known iteration complexity of second-order methods for finding an $(ε,\sqrtε)$-SOSP of unconstrained nonconvex optimization. The operation complexity, consisting of ${\cal O}(ε^{-3/2})$ Cholesky factorizations and $\widetilde{\cal O}(ε^{-3/2}\min\{n,ε^{-1/4}\})$ other fundamental operations, is also established for our method. △ Less

Submitted 11 October, 2022; v1 submitted 12 July, 2022; originally announced July 2022.

Comments: accepted by SIAM Journal on Optimization

MSC Class: 49M05; 49M15; 65F10; 90C06; 90C60

arXiv:2206.08435 [pdf, other]

doi 10.1109/TSP.2022.3231521

Optimal Parallel Sequential Change Detection under Generalized Performance Measures

Authors: Zexian Lu, Yunxiao Chen, Xiaoou Li

Abstract: This paper considers the detection of change points in parallel data streams, a problem widely encountered when analyzing large-scale real-time streaming data. Each stream may have its own change point, at which its data has a distributional change. With sequentially observed data, a decision maker needs to declare whether changes have already occurred to the streams at each time point.Once a stre… ▽ More This paper considers the detection of change points in parallel data streams, a problem widely encountered when analyzing large-scale real-time streaming data. Each stream may have its own change point, at which its data has a distributional change. With sequentially observed data, a decision maker needs to declare whether changes have already occurred to the streams at each time point.Once a stream is declared to have changed, it is deactivated permanently so that its future data will no longer be collected. This is a compound decision problem in the sense that the decision maker may want to optimize certain compound performance metrics that concern all the streams as a whole. Thus, the decisions are not independent for different streams. Our contribution is three-fold. First, we propose a general framework for compound performance metrics that includes the ones considered in the existing works as special cases and introduces new ones that connect closely with the performance metrics for single-stream sequential change detection and large-scale hypothesis testing. Second, data-driven decision procedures are developed under this framework. Finally, optimality results are established for the proposed decision procedures. The proposed methods and theory are evaluated by simulation studies and a case study. △ Less

Submitted 16 June, 2022; originally announced June 2022.

arXiv:2206.01209 [pdf, ps, other]

Accelerated first-order methods for convex optimization with locally Lipschitz continuous gradient

Authors: Zhaosong Lu, Sanyou Mei

Abstract: In this paper we develop accelerated first-order methods for convex optimization with locally Lipschitz continuous gradient (LLCG), which is beyond the well-studied class of convex optimization with Lipschitz continuous gradient. In particular, we first consider unconstrained convex optimization with LLCG and propose accelerated proximal gradient (APG) methods for solving it. The proposed APG meth… ▽ More In this paper we develop accelerated first-order methods for convex optimization with locally Lipschitz continuous gradient (LLCG), which is beyond the well-studied class of convex optimization with Lipschitz continuous gradient. In particular, we first consider unconstrained convex optimization with LLCG and propose accelerated proximal gradient (APG) methods for solving it. The proposed APG methods are equipped with a verifiable termination criterion and enjoy an operation complexity of ${\cal O}(\varepsilon^{-1/2}\log \varepsilon^{-1})$ and ${\cal O}(\log \varepsilon^{-1})$ for finding an $\varepsilon$-residual solution of an unconstrained convex and strongly convex optimization problem, respectively. We then consider constrained convex optimization with LLCG and propose an first-order proximal augmented Lagrangian method for solving it by applying one of our proposed APG methods to approximately solve a sequence of proximal augmented Lagrangian subproblems. The resulting method is equipped with a verifiable termination criterion and enjoys an operation complexity of ${\cal O}(\varepsilon^{-1}\log \varepsilon^{-1})$ and ${\cal O}(\varepsilon^{-1/2}\log \varepsilon^{-1})$ for finding an $\varepsilon$-KKT solution of a constrained convex and strongly convex optimization problem, respectively. All the proposed methods in this paper are parameter-free or almost parameter-free except that the knowledge on convexity parameter is required. In addition, preliminary numerical results are presented to demonstrate the performance of our proposed methods. To the best of our knowledge, no prior studies were conducted to investigate accelerated first-order methods with complexity guarantees for convex optimization with LLCG. All the complexity results obtained in this paper are new. △ Less

Submitted 10 April, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

Comments: Accepted by SIAM Journal on Optimization

MSC Class: 90C25; 90C30; 90C46; 49M37

arXiv:2206.00973 [pdf, ps, other]

Primal-dual extrapolation methods for monotone inclusions under local Lipschitz continuity with applications to variational inequality, conic constrained saddle point, and convex conic optimization problems

Authors: Zhaosong Lu, Sanyou Mei

Abstract: In this paper we consider a class of structured monotone inclusion (MI) problems that consist of finding a zero in the sum of two monotone operators, in which one is maximal monotone while another is locally Lipschitz continuous. In particular, we first propose a primal-dual extrapolation (PDE) method for solving a structured strongly MI problem by modifying the classical forward-backward splittin… ▽ More In this paper we consider a class of structured monotone inclusion (MI) problems that consist of finding a zero in the sum of two monotone operators, in which one is maximal monotone while another is locally Lipschitz continuous. In particular, we first propose a primal-dual extrapolation (PDE) method for solving a structured strongly MI problem by modifying the classical forward-backward splitting method by using a point and operator extrapolation technique, in which the parameters are adaptively updated by a backtracking line search scheme. The proposed PDE method is almost parameter-free, equipped with a verifiable termination criterion, and enjoys an operation complexity of ${\cal O}(\log ε^{-1})$, measured by the amount of fundamental operations consisting only of evaluations of one operator and resolvent of another operator, for finding an $ε$-residual solution of the structured strongly MI problem. We then propose another PDE method for solving a structured non-strongly MI problem by applying the above PDE method to approximately solve a sequence of structured strongly MI problems. The resulting PDE method is parameter-free, equipped with a verifiable termination criterion, and enjoys an operation complexity of ${\cal O}(ε^{-1}\log ε^{-1})$ for finding an $ε$-residual solution of the structured non-strongly MI problem. As a consequence, we apply the latter PDE method to convex conic optimization, conic constrained saddle point, and variational inequality problems, and obtain complexity results for finding an $ε$-KKT or $ε$-residual solution of them under local Lipschitz continuity. To the best of our knowledge, no prior studies were conducted to investigate methods with complexity guarantees for solving the aforementioned problems under local Lipschitz continuity. All the complexity results obtained in this paper are entirely new. △ Less

Submitted 24 June, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

Comments: corrected some typos

MSC Class: 47H05; 47J20; 49M29; 65K15; 90C25

arXiv:2202.05683 [pdf]

Rare event estimation with sequential directional importance sampling (SDIS)

Authors: Kai Cheng, Iason Papaioannou, Zhenzhou Lu, Xiaobo Zhang, Yan** Wang

Abstract: In this paper, we propose a sequential directional importance sampling (SDIS) method for rare event estimation. SDIS expresses a small failure probability in terms of a sequence of auxiliary failure probabilities, defined by magnifying the input variability. The first probability in the sequence is estimated with Monte Carlo simulation in Cartesian coordinates, and all the subsequent ones are comp… ▽ More In this paper, we propose a sequential directional importance sampling (SDIS) method for rare event estimation. SDIS expresses a small failure probability in terms of a sequence of auxiliary failure probabilities, defined by magnifying the input variability. The first probability in the sequence is estimated with Monte Carlo simulation in Cartesian coordinates, and all the subsequent ones are computed with directional importance sampling in polar coordinates. Samples from the directional importance sampling densities used to estimate the intermediate probabilities are drawn in a sequential manner through a resample-move scheme. The latter is conveniently performed in Cartesian coordinates and directional samples are obtained through a suitable transformation. For the move step, we discuss two Markov Chain Monte Carlo (MCMC) algorithms for application in low and high-dimensional problems. Finally, an adaptive choice of the parameters defining the intermediate failure probabilities is proposed and the resulting coefficient of variation of the failure probability estimate is analyzed. The proposed SDIS method is tested on five examples in various problem settings, which demonstrate that the method outperforms existing sequential sampling reliability methods. △ Less

Submitted 12 January, 2022; originally announced February 2022.

arXiv:2110.13391 [pdf]

Analyzing the Data of COVID-19 with Quasi-Distribution Fitting Based on Piecewise B-spline Curves

Authors: Qingliang Zhao, Zhenhuan Lu, Yiduo Wang

Abstract: Facing the world wide coronavirus disease 2019 (COVID-19) pandemic, a new fitting method (QDF, quasi-distribution fitting) which could be used to analyze the data of COVID-19 is developed based on piecewise quasi-uniform B-spline curves. For any given country or district, it simulates the distribution histogram data which is made from the daily confirmed cases (or the other data including daily re… ▽ More Facing the world wide coronavirus disease 2019 (COVID-19) pandemic, a new fitting method (QDF, quasi-distribution fitting) which could be used to analyze the data of COVID-19 is developed based on piecewise quasi-uniform B-spline curves. For any given country or district, it simulates the distribution histogram data which is made from the daily confirmed cases (or the other data including daily recovery cases and daily fatality cases) of the COVID-19 with piecewise quasi-uniform B-spline curves. Being dealt with area normalization method, the fitting curves could be regarded as a kind of probability density function (PDF), its mathematical expectation and the variance could be used to analyze the situation of the coronavirus pandemic. Numerical experiments based on the data of certain countries have indicated that the QDF method demonstrate the intrinsic characteristics of COVID-19 data of the given country or distric, and because of the interval of data used in this paper is over one year (500 days), it reveals the fact that after multi-wave transmission of the coronavirus, the case fatality rate has declined obviously, the result shows that as an appraisal method, it is effective and feasible. △ Less

Submitted 25 October, 2021; originally announced October 2021.

arXiv:2107.06089 [pdf, other]

MinP Score Tests with an Inequality Constrained Parameter Space

Authors: Giuseppe Cavaliere, Zeng-Hua Lu, Anders Rahbek, Yuhong Yang

Abstract: Score tests have the advantage of requiring estimation alone of the model restricted by the null hypothesis, which often is much simpler than models defined under the alternative hypothesis. This is typically so when the alternative hypothesis involves inequality constraints. However, existing score tests address only jointly testing all parameters of interest; a leading example is testing all ARC… ▽ More Score tests have the advantage of requiring estimation alone of the model restricted by the null hypothesis, which often is much simpler than models defined under the alternative hypothesis. This is typically so when the alternative hypothesis involves inequality constraints. However, existing score tests address only jointly testing all parameters of interest; a leading example is testing all ARCH parameters or variances of random coefficients being zero or not. In such testing problems rejection of the null hypothesis does not provide evidence on rejection of specific elements of parameter of interest. This paper proposes a class of one-sided score tests for testing a model parameter that is subject to inequality constraints. Proposed tests are constructed based on the minimum of a set of $p$-values. The minimand includes the $p$-values for testing individual elements of parameter of interest using individual scores. It may be extended to include a $p$-value of existing score tests. We show that our tests perform better than/or perform as good as existing score tests in terms of joint testing, and has furthermore the added benefit of allowing for simultaneously testing individual elements of parameter of interest. The added benefit is appealing in the sense that it can identify a model without estimating it. We illustrate our tests in linear regression models, ARCH and random coefficient models. A detailed simulation study is provided to examine the finite sample performance of the proposed tests and we find that our tests perform well as expected. △ Less

Submitted 13 July, 2021; originally announced July 2021.

arXiv:2106.14588 [pdf, other]

The Convergence Rate of SGD's Final Iterate: Analysis on Dimension Dependence

Authors: Daogao Liu, Zhou Lu

Abstract: Stochastic Gradient Descent (SGD) is among the simplest and most popular methods in optimization. The convergence rate for SGD has been extensively studied and tight analyses have been established for the running average scheme, but the sub-optimality of the final iterate is still not well-understood. shamir2013stochastic gave the best known upper bound for the final iterate of SGD minimizing non-… ▽ More Stochastic Gradient Descent (SGD) is among the simplest and most popular methods in optimization. The convergence rate for SGD has been extensively studied and tight analyses have been established for the running average scheme, but the sub-optimality of the final iterate is still not well-understood. shamir2013stochastic gave the best known upper bound for the final iterate of SGD minimizing non-smooth convex functions, which is $O(\log T/\sqrt{T})$ for Lipschitz convex functions and $O(\log T/ T)$ with additional assumption on strongly convexity. The best known lower bounds, however, are worse than the upper bounds by a factor of $\log T$. harvey2019tight gave matching lower bounds but their construction requires dimension $d= T$. It was then asked by koren2020open how to characterize the final-iterate convergence of SGD in the constant dimension setting. In this paper, we answer this question in the more general setting for any $d\leq T$, proving $Ω(\log d/\sqrt{T})$ and $Ω(\log d/T)$ lower bounds for the sub-optimality of the final iterate of SGD in minimizing non-smooth Lipschitz convex and strongly convex functions respectively with standard step size schedules. Our results provide the first general dimension dependent lower bound on the convergence of SGD's final iterate, partially resolving a COLT open question raised by koren2020open. We also present further evidence to show the correct rate in one dimension should be $Θ(1/\sqrt{T})$, such as a proof of a tight $O(1/\sqrt{T})$ upper bound for one-dimensional special cases in settings more general than koren2020open. △ Less

Submitted 28 June, 2021; originally announced June 2021.

arXiv:2105.12921 [pdf, other]

Score test for missing at random or not

Authors: Hairu Wang, Zhi** Lu, Yukun Liu

Abstract: Missing data are frequently encountered in various disciplines and can be divided into three categories: missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). Valid statistical approaches to missing data depend crucially on correct identification of the underlying missingness mechanism. Although the problem of testing whether this mechanism is MCAR or MAR h… ▽ More Missing data are frequently encountered in various disciplines and can be divided into three categories: missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). Valid statistical approaches to missing data depend crucially on correct identification of the underlying missingness mechanism. Although the problem of testing whether this mechanism is MCAR or MAR has been extensively studied, there has been very little research on testing MAR versus MNAR.A critical challenge that is faced when dealing with this problem is the issue of model identification under MNAR. In this paper, under a logistic model for the missing probability, we develop two score tests for the problem of whether the missingness mechanism is MAR or MNAR under a parametric model and a semiparametric location model on the regression function. The score tests require only parameter estimation under the null MAR assumption, which completely circumvents the identification issue. Our simulations and analysis of human immunodeficiency virus data show that the score tests have well-controlled type I errors and desirable powers. △ Less

Submitted 26 May, 2021; originally announced May 2021.

Comments: 22 pages, 4 tables, 2 figures

arXiv:2103.00719 [pdf, ps, other]

doi 10.1109/TPAMI.2021.3061463

LocalDrop: A Hybrid Regularization for Deep Neural Networks

Authors: Ziqing Lu, Chang Xu, Bo Du, Takashi Ishida, Lefei Zhang, Masashi Sugiyama

Abstract: In neural networks, develo** regularization algorithms to settle overfitting is one of the major study areas. We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop. A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs), including drop rates and weight matrices, has been dev… ▽ More In neural networks, develo** regularization algorithms to settle overfitting is one of the major study areas. We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop. A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs), including drop rates and weight matrices, has been developed based on the proposed upper bound of the local Rademacher complexity by the strict mathematical deduction. The analyses of dropout in FCNs and DropBlock in CNNs with keep rate matrices in different layers are also included in the complexity analyses. With the new regularization function, we establish a two-stage procedure to obtain the optimal keep rate matrix and weight matrix to realize the whole training model. Extensive experiments have been conducted to demonstrate the effectiveness of LocalDrop in different models by comparing it with several algorithms and the effects of different hyperparameters on the final performances. △ Less

Submitted 28 February, 2021; originally announced March 2021.

arXiv:2102.05363 [pdf, other]

Towards Certifying L-infinity Robustness using Neural Networks with L-inf-dist Neurons

Authors: Bohang Zhang, Tianle Cai, Zhou Lu, Di He, Liwei Wang

Abstract: It is well-known that standard neural networks, even with a high classification accuracy, are vulnerable to small $\ell_\infty$-norm bounded adversarial perturbations. Although many attempts have been made, most previous works either can only provide empirical verification of the defense to a particular attack method, or can only develop a certified guarantee of the model robustness in limited sce… ▽ More It is well-known that standard neural networks, even with a high classification accuracy, are vulnerable to small $\ell_\infty$-norm bounded adversarial perturbations. Although many attempts have been made, most previous works either can only provide empirical verification of the defense to a particular attack method, or can only develop a certified guarantee of the model robustness in limited scenarios. In this paper, we seek for a new approach to develop a theoretically principled neural network that inherently resists $\ell_\infty$ perturbations. In particular, we design a novel neuron that uses $\ell_\infty$-distance as its basic operation (which we call $\ell_\infty$-dist neuron), and show that any neural network constructed with $\ell_\infty$-dist neurons (called $\ell_{\infty}$-dist net) is naturally a 1-Lipschitz function with respect to $\ell_\infty$-norm. This directly provides a rigorous guarantee of the certified robustness based on the margin of prediction outputs. We then prove that such networks have enough expressive power to approximate any 1-Lipschitz function with robust generalization guarantee. We further provide a holistic training strategy that can greatly alleviate optimization difficulties. Experimental results show that using $\ell_{\infty}$-dist nets as basic building blocks, we consistently achieve state-of-the-art performance on commonly used datasets: 93.09% certified accuracy on MNIST ($ε=0.3$), 35.42% on CIFAR-10 ($ε=8/255$) and 16.31% on TinyImageNet ($ε=1/255$). △ Less

Submitted 14 June, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

Comments: Appearing at International Conference on Machine Learning (ICML) 2021

arXiv:2101.11286 [pdf, ps, other]

A Note on the Representation Power of GHHs

Authors: Zhou Lu

Abstract: In this note we prove a sharp lower bound on the necessary number of nestings of nested absolute-value functions of generalized hinging hyperplanes (GHH) to represent arbitrary CPWL functions. Previous upper bound states that $n+1$ nestings is sufficient for GHH to achieve universal representation power, but the corresponding lower bound was unknown. We prove that $n$ nestings is necessary for uni… ▽ More In this note we prove a sharp lower bound on the necessary number of nestings of nested absolute-value functions of generalized hinging hyperplanes (GHH) to represent arbitrary CPWL functions. Previous upper bound states that $n+1$ nestings is sufficient for GHH to achieve universal representation power, but the corresponding lower bound was unknown. We prove that $n$ nestings is necessary for universal representation power, which provides an almost tight lower bound. We also show that one-hidden-layer neural networks don't have universal approximation power over the whole domain. The analysis is based on a key lemma showing that any finite sum of periodic functions is either non-integrable or the zero function, which might be of independent interest. △ Less

Submitted 27 January, 2021; originally announced January 2021.

arXiv:2012.13326 [pdf, ps, other]

A Tight Lower Bound for Uniformly Stable Algorithms

Authors: Qinghua Liu, Zhou Lu

Abstract: Leveraging algorithmic stability to derive sharp generalization bounds is a classic and powerful approach in learning theory. Since Vapnik and Chervonenkis [1974] first formalized the idea for analyzing SVMs, it has been utilized to study many fundamental learning algorithms (e.g., $k$-nearest neighbors [Rogers and Wagner, 1978], stochastic gradient method [Hardt et al., 2016], linear regression [… ▽ More Leveraging algorithmic stability to derive sharp generalization bounds is a classic and powerful approach in learning theory. Since Vapnik and Chervonenkis [1974] first formalized the idea for analyzing SVMs, it has been utilized to study many fundamental learning algorithms (e.g., $k$-nearest neighbors [Rogers and Wagner, 1978], stochastic gradient method [Hardt et al., 2016], linear regression [Maurer, 2017], etc). In a recent line of great works by Feldman and Vondrak [2018, 2019] as well as Bousquet et al. [2020b], they prove a high probability generalization upper bound of order $\tilde{\mathcal{O}}(γ+\frac{L}{\sqrt{n}})$ for any uniformly $γ$-stable algorithm and $L$-bounded loss function. Although much progress was achieved in proving generalization upper bounds for stable algorithms, our knowledge of lower bounds is rather limited. In fact, there is no nontrivial lower bound known ever since the study of uniform stability [Bousquet and Elisseeff, 2002], to the best of our knowledge. In this paper we fill the gap by proving a tight generalization lower bound of order $Ω(γ+\frac{L}{\sqrt{n}})$, which matches the best known upper bound up to logarithmic factors △ Less

Submitted 24 January, 2021; v1 submitted 24 December, 2020; originally announced December 2020.

arXiv:2011.10020 [pdf]

Modelling fertility potential in survivors of childhood cancer: An introduction to modern statistical and computational methods

Authors: L. Yu, Z. Lu, P. C. Nathan, S. Mostoufi-Moab, Y. Yuan

Abstract: Statistical and computational methods are widely used in today's scientific studies. Using a female fertility potential in childhood cancer survivors as an example, we illustrate how these methods can be used to extract insight regarding biological processes from noisy observational data in order to inform decision making. We start by contextualizing the computational methods with the working exam… ▽ More Statistical and computational methods are widely used in today's scientific studies. Using a female fertility potential in childhood cancer survivors as an example, we illustrate how these methods can be used to extract insight regarding biological processes from noisy observational data in order to inform decision making. We start by contextualizing the computational methods with the working example: the modelling of acute ovarian failure risk in female childhood cancer survivors to quantify the risk of permanent ovarian failure due to exposure to lifesaving but nonetheless toxic cancer treatments. This is followed by a description of the general framework of classification problems. We provide an overview of the modelling algorithms employed in our example, including one classic model (logistic regression) and two popular modern learning methods (random forest and support vector machines). Using the working example, we show the general steps of data preparation for modelling, variable selection steps for the classic model, and how model performance might be improved utilizing visualization tools. We end with a note on the importance of model evaluation. △ Less

Submitted 19 November, 2020; originally announced November 2020.

Comments: 15 pages, 9 figures and 1 table

arXiv:2010.09822 [pdf, other]

Is the new model better? One metric says yes, but the other says no. Which metric do I use?

Authors: Qian M. Zhou, Zhe Lu, Russell J. Brooke, Melissa M Hudson, Yan Yuan

Abstract: Incremental value (IncV) evaluates the performance change from an existing risk model to a new model. It is one of the key considerations in deciding whether a new risk model performs better than the existing one. Problems arise when different IncV metrics contradict each other. For example, compared with a prescribed-dose model, an ovarian-dose model for predicting acute ovarian failure has a sli… ▽ More Incremental value (IncV) evaluates the performance change from an existing risk model to a new model. It is one of the key considerations in deciding whether a new risk model performs better than the existing one. Problems arise when different IncV metrics contradict each other. For example, compared with a prescribed-dose model, an ovarian-dose model for predicting acute ovarian failure has a slightly lower area under the receiver operating characteristic curve (AUC) but increases the area under the precision-recall curve (AP) by 48%. This phenomenon of conflicting conclusions is not uncommon, and it creates a dilemma in medical decision making. In this article, we examine the analytical connections and differences between two IncV metrics: IncV in AUC (IncV-AUC) and IncV in AP (IncV-AP). Additionally, since they are both semi-proper scoring rules, we compare them with a strictly proper scoring rule: the IncV of the scaled Brier score (IncV-sBrS), via a numerical study. We demonstrate that both IncV-AUC and IncV-AP are weighted averages of the changes (from the existing model to the new one) in separating the risk score distributions between events and non-events. However, IncV-AP assigns heavier weights to the changes in the high-risk group, whereas IncV-AUC weights the changes equally. In the numerical study, we find that IncV-AP has a wide range, from negative to positive, but the size of IncV-AUC is much smaller. In addition, IncV-AP and IncV-sBr Sare highly consistent, but IncV-AUC is negatively correlated with IncV-sBrS and IncV-AP at a low event rate. IncV-AUC and IncV-AP are the least consistent among the three pairs, and their differences are more pronounced as the event rate decreases. △ Less

Submitted 15 December, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

Comments: 25 pages, 6 figures, 1 table. Compared to Version 1, the title and overall structure of the manuscript have been changed significantly

arXiv:2006.07584 [pdf, other]

Mean-Field Approximation to Gaussian-Softmax Integral with Application to Uncertainty Estimation

Authors: Zhiyun Lu, Eugene Ie, Fei Sha

Abstract: Many methods have been proposed to quantify the predictive uncertainty associated with the outputs of deep neural networks. Among them, ensemble methods often lead to state-of-the-art results, though they require modifications to the training procedures and are computationally costly for both training and inference. In this paper, we propose a new single-model based approach. The main idea is insp… ▽ More Many methods have been proposed to quantify the predictive uncertainty associated with the outputs of deep neural networks. Among them, ensemble methods often lead to state-of-the-art results, though they require modifications to the training procedures and are computationally costly for both training and inference. In this paper, we propose a new single-model based approach. The main idea is inspired by the observation that we can "simulate" an ensemble of models by drawing from a Gaussian distribution, with a form similar to those from the asymptotic normality theory, infinitesimal Jackknife, Laplacian approximation to Bayesian neural networks, and trajectories in stochastic gradient descents. However, instead of using each model in the "ensemble" to predict and then aggregating their predictions, we integrate the Gaussian distribution and the softmax outputs of the neural networks. We use a mean-field approximation formula to compute this analytically intractable integral. The proposed approach has several appealing properties: it functions as an ensemble without requiring multiple models, and it enables closed-form approximate inference using only the first and second moments of the Gaussian. Empirically, the proposed approach performs competitively when compared to state-of-the-art methods, including deep ensembles, temperature scaling, dropout and Bayesian NNs, on standard uncertainty estimation tasks. It also outperforms many methods on out-of-distribution detection. △ Less

Submitted 9 May, 2021; v1 submitted 13 June, 2020; originally announced June 2020.

arXiv:2006.06455 [pdf, other]

Learning Individually Inferred Communication for Multi-Agent Cooperation

Authors: Ziluo Ding, Tiejun Huang, Zongqing Lu

Abstract: Communication lays the foundation for human cooperation. It is also crucial for multi-agent cooperation. However, existing work focuses on broadcast communication, which is not only impractical but also leads to information redundancy that could even impair the learning process. To tackle these difficulties, we propose Individually Inferred Communication (I2C), a simple yet effective model to enab… ▽ More Communication lays the foundation for human cooperation. It is also crucial for multi-agent cooperation. However, existing work focuses on broadcast communication, which is not only impractical but also leads to information redundancy that could even impair the learning process. To tackle these difficulties, we propose Individually Inferred Communication (I2C), a simple yet effective model to enable agents to learn a prior for agent-agent communication. The prior knowledge is learned via causal inference and realized by a feed-forward neural network that maps the agent's local observation to a belief about who to communicate with. The influence of one agent on another is inferred via the joint action-value function in multi-agent reinforcement learning and quantified to label the necessity of agent-agent communication. Furthermore, the agent policy is regularized to better exploit communicated messages. Empirically, we show that I2C can not only reduce communication overhead but also improve the performance in a variety of multi-agent cooperative scenarios, comparing to existing methods. The code is available at https://github.com/PKU-AI-Edge/I2C. △ Less

Submitted 28 April, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: NeurIPS 2020, oral presentation. The code is available at https://github.com/PKU-AI-Edge/I2C

arXiv:2006.05842 [pdf, other]

The Emergence of Individuality

Authors: Jiechuan Jiang, Zongqing Lu

Abstract: Individuality is essential in human society, which induces the division of labor and thus improves the efficiency and productivity. Similarly, it should also be the key to multi-agent cooperation. Inspired by that individuality is of being an individual separate from others, we propose a simple yet efficient method for the emergence of individuality (EOI) in multi-agent reinforcement learning (MAR… ▽ More Individuality is essential in human society, which induces the division of labor and thus improves the efficiency and productivity. Similarly, it should also be the key to multi-agent cooperation. Inspired by that individuality is of being an individual separate from others, we propose a simple yet efficient method for the emergence of individuality (EOI) in multi-agent reinforcement learning (MARL). EOI learns a probabilistic classifier that predicts a probability distribution over agents given their observation and gives each agent an intrinsic reward of being correctly predicted by the classifier. The intrinsic reward encourages the agents to visit their own familiar observations, and learning the classifier by such observations makes the intrinsic reward signals stronger and the agents more identifiable. To further enhance the intrinsic reward and promote the emergence of individuality, two regularizers are proposed to increase the discriminability of the classifier. We implement EOI on top of popular MARL algorithms. Empirically, we show that EOI significantly outperforms existing methods in a variety of multi-agent cooperative scenarios. △ Less

Submitted 18 October, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

Comments: The extended version of ICML 2021 paper

arXiv:2004.05707 [pdf, other]

VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification

Authors: Zhibin Lu, Pan Du, Jian-Yun Nie

Abstract: Much progress has been made recently on text classification with methods based on neural networks. In particular, models using attention mechanism such as BERT have shown to have the capability of capturing the contextual information within a sentence or document. However, their ability of capturing the global information about the vocabulary of a language is more limited. This latter is the stren… ▽ More Much progress has been made recently on text classification with methods based on neural networks. In particular, models using attention mechanism such as BERT have shown to have the capability of capturing the contextual information within a sentence or document. However, their ability of capturing the global information about the vocabulary of a language is more limited. This latter is the strength of Graph Convolutional Networks (GCN). In this paper, we propose VGCN-BERT model which combines the capability of BERT with a Vocabulary Graph Convolutional Network (VGCN). Local information and global information interact through different layers of BERT, allowing them to influence mutually and to build together a final representation for classification. In our experiments on several text classification datasets, our approach outperforms BERT and GCN alone, and achieve higher effectiveness than that reported in previous studies. △ Less

Submitted 12 April, 2020; originally announced April 2020.

Comments: 12 pages, 2 figures

ACM Class: I.2.4; I.2.7

Journal ref: in J. M. Jose et al. (Eds.): ECIR 2020, LNCS 12035, pp.369-382, 2020

arXiv:2002.12641 [pdf, other]

AdarGCN: Adaptive Aggregation GCN for Few-Shot Learning

Authors: Jianhong Zhang, Manli Zhang, Zhiwu Lu, Tao Xiang, Jirong Wen

Abstract: Existing few-shot learning (FSL) methods assume that there exist sufficient training samples from source classes for knowledge transfer to target classes with few training samples. However, this assumption is often invalid, especially when it comes to fine-grained recognition. In this work, we define a new FSL setting termed few-shot fewshot learning (FSFSL), under which both the source and target… ▽ More Existing few-shot learning (FSL) methods assume that there exist sufficient training samples from source classes for knowledge transfer to target classes with few training samples. However, this assumption is often invalid, especially when it comes to fine-grained recognition. In this work, we define a new FSL setting termed few-shot fewshot learning (FSFSL), under which both the source and target classes have limited training samples. To overcome the source class data scarcity problem, a natural option is to crawl images from the web with class names as search keywords. However, the crawled images are inevitably corrupted by large amount of noise (irrelevant images) and thus may harm the performance. To address this problem, we propose a graph convolutional network (GCN)-based label denoising (LDN) method to remove the irrelevant images. Further, with the cleaned web images as well as the original clean training images, we propose a GCN-based FSL method. For both the LDN and FSL tasks, a novel adaptive aggregation GCN (AdarGCN) model is proposed, which differs from existing GCN models in that adaptive aggregation is performed based on a multi-head multi-level aggregation module. With AdarGCN, how much and how far information carried by each graph node is propagated in the graph structure can be determined automatically, therefore alleviating the effects of both noisy and outlying training samples. Extensive experiments show the superior performance of our AdarGCN under both the new FSFSL and the conventional FSL settings. △ Less

Submitted 9 March, 2020; v1 submitted 28 February, 2020; originally announced February 2020.

Comments: The code is at github - https://github.com/RiceZJH/AdarGCN

arXiv:2002.06856 [pdf, other]

Data and Model Dependencies of Membership Inference Attack

Authors: Shakila Mahjabin Tonni, Dinusha Vatsalan, Farhad Farokhi, Dali Kaafar, Zhigang Lu, Gioacchino Tangari

Abstract: Machine learning (ML) models have been shown to be vulnerable to Membership Inference Attacks (MIA), which infer the membership of a given data point in the target dataset by observing the prediction output of the ML model. While the key factors for the success of MIA have not yet been fully understood, existing defense mechanisms such as using L2 regularization \cite{10shokri2017membership} and d… ▽ More Machine learning (ML) models have been shown to be vulnerable to Membership Inference Attacks (MIA), which infer the membership of a given data point in the target dataset by observing the prediction output of the ML model. While the key factors for the success of MIA have not yet been fully understood, existing defense mechanisms such as using L2 regularization \cite{10shokri2017membership} and dropout layers \cite{salem2018ml} take only the model's overfitting property into consideration. In this paper, we provide an empirical analysis of the impact of both the data and ML model properties on the vulnerability of ML techniques to MIA. Our results reveal the relationship between MIA accuracy and properties of the dataset and training model in use. In particular, we show that the size of shadow dataset, the class and feature balance and the entropy of the target dataset, the configurations and fairness of the training model are the most influential factors. Based on those experimental findings, we conclude that along with model overfitting, multiple properties jointly contribute to MIA success instead of any single property. Building on our experimental findings, we propose using those data and model properties as regularizers to protect ML models against MIA. Our results show that the proposed defense mechanisms can reduce the MIA accuracy by up to 25\% without sacrificing the ML model prediction utility. △ Less

Submitted 25 July, 2020; v1 submitted 17 February, 2020; originally announced February 2020.

arXiv:2002.04274

Meta-Learning across Meta-Tasks for Few-Shot Learning

Authors: Nanyi Fei, Zhiwu Lu, Yizhao Gao, Jia Tian, Tao Xiang, Ji-Rong Wen

Abstract: Existing meta-learning based few-shot learning (FSL) methods typically adopt an episodic training strategy whereby each episode contains a meta-task. Across episodes, these tasks are sampled randomly and their relationships are ignored. In this paper, we argue that the inter-meta-task relationships should be exploited and those tasks are sampled strategically to assist in meta-learning. Specifical… ▽ More Existing meta-learning based few-shot learning (FSL) methods typically adopt an episodic training strategy whereby each episode contains a meta-task. Across episodes, these tasks are sampled randomly and their relationships are ignored. In this paper, we argue that the inter-meta-task relationships should be exploited and those tasks are sampled strategically to assist in meta-learning. Specifically, we consider the relationships defined over two types of meta-task pairs and propose different strategies to exploit them. (1) Two meta-tasks with disjoint sets of classes: this pair is interesting because it is reminiscent of the relationship between the source seen classes and target unseen classes, featured with domain gap caused by class differences. A novel learning objective termed meta-domain adaptation (MDA) is proposed to make the meta-learned model more robust to the domain gap. (2) Two meta-tasks with identical sets of classes: this pair is useful because it can be employed to learn models that are robust against poorly sampled few-shots. To that end, a novel meta-knowledge distillation (MKD) objective is formulated. There are some mistakes in the experiments. We thus choose to withdraw this paper. △ Less

Submitted 26 September, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

Comments: There are some mistakes in the experiments. We thus choose to withdraw this paper

arXiv:2002.02050

Few-Shot Learning as Domain Adaptation: Algorithm and Analysis

Authors: Jiechao Guan, Zhiwu Lu, Tao Xiang, Ji-Rong Wen

Abstract: To recognize the unseen classes with only few samples, few-shot learning (FSL) uses prior knowledge learned from the seen classes. A major challenge for FSL is that the distribution of the unseen classes is different from that of those seen, resulting in poor generalization even when a model is meta-trained on the seen classes. This class-difference-caused distribution shift can be considered as a… ▽ More To recognize the unseen classes with only few samples, few-shot learning (FSL) uses prior knowledge learned from the seen classes. A major challenge for FSL is that the distribution of the unseen classes is different from that of those seen, resulting in poor generalization even when a model is meta-trained on the seen classes. This class-difference-caused distribution shift can be considered as a special case of domain shift. In this paper, for the first time, we propose a domain adaptation prototypical network with attention (DAPNA) to explicitly tackle such a domain shift problem in a meta-learning framework. Specifically, armed with a set transformer based attention module, we construct each episode with two sub-episodes without class overlap on the seen classes to simulate the domain shift between the seen and unseen classes. To align the feature distributions of the two sub-episodes with limited training samples, a feature transfer network is employed together with a margin disparity discrepancy (MDD) loss. Importantly, theoretical analysis is provided to give the learning bound of our DAPNA. Extensive experiments show that our DAPNA outperforms the state-of-the-art FSL alternatives, often by significant margins. △ Less

Submitted 27 July, 2020; v1 submitted 5 February, 2020; originally announced February 2020.

Comments: There exist some mistakes in the experiments

arXiv:1911.01545 [pdf, other]

Compositional Generalization with Tree Stack Memory Units

Authors: Forough Arabshahi, Zhichu Lu, Pranay Mundra, Sameer Singh, Animashree Anandkumar

Abstract: We study compositional generalization, viz., the problem of zero-shot generalization to novel compositions of concepts in a domain. Standard neural networks fail to a large extent on compositional learning. We propose Tree Stack Memory Units (Tree-SMU) to enable strong compositional generalization. Tree-SMU is a recursive neural network with Stack Memory Units (\SMU s), a novel memory augmented ne… ▽ More We study compositional generalization, viz., the problem of zero-shot generalization to novel compositions of concepts in a domain. Standard neural networks fail to a large extent on compositional learning. We propose Tree Stack Memory Units (Tree-SMU) to enable strong compositional generalization. Tree-SMU is a recursive neural network with Stack Memory Units (\SMU s), a novel memory augmented neural network whose memory has a differentiable stack structure. Each SMU in the tree architecture learns to read from its stack and to write to it by combining the stacks and states of its children through gating. The stack helps capture long-range dependencies in the problem domain, thereby enabling compositional generalization. Additionally, the stack also preserves the ordering of each node's descendants, thereby retaining locality on the tree. We demonstrate strong empirical results on two mathematical reasoning benchmarks. We use four compositionality tests to assess the generalization performance of Tree-SMU and show that it enables accurate compositional generalization compared to strong baselines such as Transformers and Tree-LSTMs. △ Less

Submitted 15 October, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

arXiv:1910.14472 [pdf, other]

Learning Fairness in Multi-Agent Systems

Authors: Jiechuan Jiang, Zongqing Lu

Abstract: Fairness is essential for human society, contributing to stability and productivity. Similarly, fairness is also the key for many multi-agent systems. Taking fairness into multi-agent learning could help multi-agent systems become both efficient and stable. However, learning efficiency and fairness simultaneously is a complex, multi-objective, joint-policy optimization. To tackle these difficultie… ▽ More Fairness is essential for human society, contributing to stability and productivity. Similarly, fairness is also the key for many multi-agent systems. Taking fairness into multi-agent learning could help multi-agent systems become both efficient and stable. However, learning efficiency and fairness simultaneously is a complex, multi-objective, joint-policy optimization. To tackle these difficulties, we propose FEN, a novel hierarchical reinforcement learning model. We first decompose fairness for each agent and propose fair-efficient reward that each agent learns its own policy to optimize. To avoid multi-objective conflict, we design a hierarchy consisting of a controller and several sub-policies, where the controller maximizes the fair-efficient reward by switching among the sub-policies that provides diverse behaviors to interact with the environment. FEN can be trained in a fully decentralized way, making it easy to be deployed in real-world applications. Empirically, we show that FEN easily learns both fairness and efficiency and significantly outperforms baselines in a variety of multi-agent scenarios. △ Less

Submitted 31 October, 2019; originally announced October 2019.

Comments: NeurIPS'19

arXiv:1909.03044 [pdf]

Deep learning with sentence embeddings pre-trained on biomedical corpora improves the performance of finding similar sentences in electronic medical records

Authors: Qingyu Chen, **gcheng Du, Sun Kim, W. John Wilbur, Zhiyong Lu

Abstract: Capturing sentence semantics plays a vital role in a range of text mining applications. Despite continuous efforts on the development of related datasets and models in the general domain, both datasets and models are limited in biomedical and clinical domains. The BioCreative/OHNLP organizers have made the first attempt to annotate 1,068 sentence pairs from clinical notes and have called for a com… ▽ More Capturing sentence semantics plays a vital role in a range of text mining applications. Despite continuous efforts on the development of related datasets and models in the general domain, both datasets and models are limited in biomedical and clinical domains. The BioCreative/OHNLP organizers have made the first attempt to annotate 1,068 sentence pairs from clinical notes and have called for a community effort to tackle the Semantic Textual Similarity (BioCreative/OHNLP STS) challenge. We developed models using traditional machine learning and deep learning approaches. For the post challenge, we focus on two models: the Random Forest and the Encoder Network. We applied sentence embeddings pre-trained on PubMed abstracts and MIMIC-III clinical notes and updated the Random Forest and the Encoder Network accordingly. The official results demonstrated our best submission was the ensemble of eight models. It achieved a Person correlation coefficient of 0.8328, the highest performance among 13 submissions from 4 teams. For the post challenge, the performance of both Random Forest and the Encoder Network was improved; in particular, the correlation of the Encoder Network was improved by ~13%. During the challenge task, no end-to-end deep learning models had better performance than machine learning models that take manually-crafted features. In contrast, with the sentence embeddings pre-trained on biomedical corpora, the Encoder Network now achieves a correlation of ~0.84, which is higher than the original best model. The ensembled model taking the improved versions of the Random Forest and Encoder Network as inputs further increased performance to 0.8528. Deep learning models with sentence embeddings pre-trained on biomedical corpora achieve the highest performance on the test set. △ Less

Submitted 6 September, 2019; originally announced September 2019.

Comments: 15 pages, 5 figures, 2 tables

arXiv:1908.08401 [pdf, ps, other]

A Deep Actor-Critic Reinforcement Learning Framework for Dynamic Multichannel Access

Authors: Chen Zhong, Ziyang Lu, M. Cenk Gursoy, Senem Velipasalar

Abstract: To make efficient use of limited spectral resources, we in this work propose a deep actor-critic reinforcement learning based framework for dynamic multichannel access. We consider both a single-user case and a scenario in which multiple users attempt to access channels simultaneously. We employ the proposed framework as a single agent in the single-user case, and extend it to a decentralized mult… ▽ More To make efficient use of limited spectral resources, we in this work propose a deep actor-critic reinforcement learning based framework for dynamic multichannel access. We consider both a single-user case and a scenario in which multiple users attempt to access channels simultaneously. We employ the proposed framework as a single agent in the single-user case, and extend it to a decentralized multi-agent framework in the multi-user scenario. In both cases, we develop algorithms for the actor-critic deep reinforcement learning and evaluate the proposed learning policies via experiments and numerical results. In the single-user model, in order to evaluate the performance of the proposed channel access policy and the framework's tolerance against uncertainty, we explore different channel switching patterns and different switching probabilities. In the case of multiple users, we analyze the probabilities of each user accessing channels with favorable channel conditions and the probability of collision. We also address a time-varying environment to identify the adaptive ability of the proposed framework. Additionally, we provide comparisons (in terms of both the average reward and time efficiency) between the proposed actor-critic deep reinforcement learning framework, Deep-Q network (DQN) based approach, random access, and the optimal policy when the channel dynamics are known. △ Less

Submitted 20 August, 2019; originally announced August 2019.

Comments: 14 figures. arXiv admin note: text overlap with arXiv:1810.03695

arXiv:1907.13177 [pdf, ps, other]

doi 10.1109/TBME.2020.3020381

Towards More Accurate Automatic Sleep Staging via Deep Transfer Learning

Authors: Huy Phan, Oliver Y. Chén, Philipp Koch, Zongqing Lu, Ian McLoughlin, Alfred Mertins, Maarten De Vos

Abstract: Background: Despite recent significant progress in the development of automatic sleep staging methods, building a good model still remains a big challenge for sleep studies with a small cohort due to the data-variability and data-inefficiency issues. This work presents a deep transfer learning approach to overcome these issues and enable transferring knowledge from a large dataset to a small cohor… ▽ More Background: Despite recent significant progress in the development of automatic sleep staging methods, building a good model still remains a big challenge for sleep studies with a small cohort due to the data-variability and data-inefficiency issues. This work presents a deep transfer learning approach to overcome these issues and enable transferring knowledge from a large dataset to a small cohort for automatic sleep staging. Methods: We start from a generic end-to-end deep learning framework for sequence-to-sequence sleep staging and derive two networks as the means for transfer learning. The networks are first trained in the source domain (i.e. the large database). The pretrained networks are then finetuned in the target domain (i.e. the small cohort) to complete knowledge transfer. We employ the Montreal Archive of Sleep Studies (MASS) database consisting of 200 subjects as the source domain and study deep transfer learning on three different target domains: the Sleep Cassette subset and the Sleep Telemetry subset of the Sleep-EDF Expanded database, and the Surrey-cEEGrid database. The target domains are purposely adopted to cover different degrees of data mismatch to the source domains. Results: Our experimental results show significant performance improvement on automatic sleep staging on the target domains achieved with the proposed deep transfer learning approach. Conclusions: These results suggest the efficacy of the proposed approach in addressing the above-mentioned data-variability and data-inefficiency issues. Significance: As a consequence, it would enable one to improve the quality of automatic sleep staging models when the amount of data is relatively small. The source code and the pretrained models are available at http://github.com/pquochuy/sleep_transfer_learning. △ Less

Submitted 27 August, 2020; v1 submitted 30 July, 2019; originally announced July 2019.

Comments: This article has been published in IEEE Transactions on Biomedical Engineering

arXiv:1907.01175 [pdf, other]

Volatility Analysis with Realized GARCH-Ito Models

Authors: Xinyu Song, Donggyu Kim, Huiling Yuan, Xiangyu Cui, Zhi** Lu, Yong Zhou, Yazhen Wang

Abstract: This paper introduces a unified approach for modeling high-frequency financial data that can accommodate both the continuous-time jump-diffusion and discrete-time realized GARCH model by embedding the discrete realized GARCH structure in the continuous instantaneous volatility process. The key feature of the proposed model is that the corresponding conditional daily integrated volatility adopts an… ▽ More This paper introduces a unified approach for modeling high-frequency financial data that can accommodate both the continuous-time jump-diffusion and discrete-time realized GARCH model by embedding the discrete realized GARCH structure in the continuous instantaneous volatility process. The key feature of the proposed model is that the corresponding conditional daily integrated volatility adopts an autoregressive structure where both integrated volatility and jump variation serve as innovations. We name it as the realized GARCH-Ito model. Given the autoregressive structure in the conditional daily integrated volatility, we propose a quasi-likelihood function for parameter estimation and establish its asymptotic properties. To improve the parameter estimation, we propose a joint quasi-likelihood function that is built on the marriage of daily integrated volatility estimated by high-frequency data and nonparametric volatility estimator obtained from option data. We conduct a simulation study to check the finite sample performance of the proposed methodologies and an empirical study with the S&P500 stock index and option data. △ Less

Submitted 15 June, 2020; v1 submitted 2 July, 2019; originally announced July 2019.

Comments: 39 pages, 4 tables, 3 figures

arXiv:1906.08720 [pdf, other]

Boosting for Control of Dynamical Systems

Authors: Naman Agarwal, Nataly Brukhim, Elad Hazan, Zhou Lu

Abstract: We study the question of how to aggregate controllers for dynamical systems in order to improve their performance. To this end, we propose a framework of boosting for online control. Our main result is an efficient boosting algorithm that combines weak controllers into a provably more accurate one. Empirical evaluation on a host of control settings supports our theoretical findings. We study the question of how to aggregate controllers for dynamical systems in order to improve their performance. To this end, we propose a framework of boosting for online control. Our main result is an efficient boosting algorithm that combines weak controllers into a provably more accurate one. Empirical evaluation on a host of control settings supports our theoretical findings. △ Less

Submitted 23 February, 2020; v1 submitted 20 June, 2019; originally announced June 2019.

arXiv:1905.03041 [pdf, other]

doi 10.1145/3308558.3308558.3313622

Tag2Vec: Learning Tag Representations in Tag Networks

Authors: Junshan Wang, Zhicong Lu, Guojie Song, Yue Fan, Lun Du, Wei Lin

Abstract: Network embedding is a method to learn low-dimensional representation vectors for nodes in complex networks. In real networks, nodes may have multiple tags but existing methods ignore the abundant semantic and hierarchical information of tags. This information is useful to many network applications and usually very stable. In this paper, we propose a tag representation learning model, Tag2Vec, whi… ▽ More Network embedding is a method to learn low-dimensional representation vectors for nodes in complex networks. In real networks, nodes may have multiple tags but existing methods ignore the abundant semantic and hierarchical information of tags. This information is useful to many network applications and usually very stable. In this paper, we propose a tag representation learning model, Tag2Vec, which mixes nodes and tags into a hybrid network. Firstly, for tag networks, we define semantic distance as the proximity between tags and design a novel strategy, parameterized random walk, to generate context with semantic and hierarchical information of tags adaptively. Then, we propose hyperbolic Skip-gram model to express the complex hierarchical structure better with lower output dimensions. We evaluate our model on the NBER U.S. patent dataset and WordNet dataset. The results show that our model can learn tag representations with rich semantic information and it outperforms other baselines. △ Less

Submitted 24 September, 2020; v1 submitted 19 April, 2019; originally announced May 2019.

Comments: 6 pages

Showing 1–50 of 89 results for author: Lu, Z