Search | arXiv e-print repository

A Class of Directed Acyclic Graphs with Mixed Data Types in Mediation Analysis

Authors: Wei Hao, Canyi Chen, Peter X. -K. Song

Abstract: We propose a unified class of generalized structural equation models (GSEMs) with data of mixed types in mediation analysis, including continuous, categorical, and count variables. Such models extend substantially the classical linear structural equation model to accommodate many data types arising from the application of mediation analysis. Invoking the hierarchical modeling approach, we specify… ▽ More We propose a unified class of generalized structural equation models (GSEMs) with data of mixed types in mediation analysis, including continuous, categorical, and count variables. Such models extend substantially the classical linear structural equation model to accommodate many data types arising from the application of mediation analysis. Invoking the hierarchical modeling approach, we specify GSEMs by a copula joint distribution of outcome variable, mediator and exposure variable, in which marginal distributions are built upon generalized linear models (GLMs) with confounding factors. We discuss the identifiability conditions for the causal mediation effects in the counterfactual paradigm as well as the issue of mediation leakage, and develop an asymptotically efficient profile maximum likelihood estimation and inference for two key mediation estimands, natural direct effect and natural indirect effect, in different scenarios of mixed data types. The proposed new methodology is illustrated by a motivating epidemiological study that aims to investigate whether the tempo of reaching infancy BMI peak (delay or on time), an important early life growth milestone, may mediate the association between prenatal exposure to phthalates and pubertal health outcomes. △ Less

Submitted 4 December, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: 33 pages, 3 figures, 3 tables

arXiv:2311.16628 [pdf, ps, other]

Symmetry-regularized neural ordinary differential equations

Authors: Wenbo Hao

Abstract: Neural Ordinary Differential Equations (Neural ODEs) is a class of deep neural network models that interpret the hidden state dynamics of neural networks as an ordinary differential equation, thereby capable of capturing system dynamics in a continuous time framework. In this work, I integrate symmetry regularization into Neural ODEs. In particular, I use continuous Lie symmetry of ODEs and PDEs a… ▽ More Neural Ordinary Differential Equations (Neural ODEs) is a class of deep neural network models that interpret the hidden state dynamics of neural networks as an ordinary differential equation, thereby capable of capturing system dynamics in a continuous time framework. In this work, I integrate symmetry regularization into Neural ODEs. In particular, I use continuous Lie symmetry of ODEs and PDEs associated with the model to derive conservation laws and add them to the loss function, making it physics-informed. This incorporation of inherent structural properties into the loss function could significantly improve robustness and stability of the model during training. To illustrate this method, I employ a toy model that utilizes a cosine rate of change in the hidden state, showcasing the process of identifying Lie symmetries, deriving conservation laws, and constructing a new loss function. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2309.10301 [pdf, other]

Prominent Roles of Conditionally Invariant Components in Domain Adaptation: Theory and Algorithms

Authors: Keru Wu, Yuansi Chen, Wooseok Ha, Bin Yu

Abstract: Domain adaptation (DA) is a statistical learning problem that arises when the distribution of the source data used to train a model differs from that of the target data used to evaluate the model. While many DA algorithms have demonstrated considerable empirical success, blindly applying these algorithms can often lead to worse performance on new datasets. To address this, it is crucial to clarify… ▽ More Domain adaptation (DA) is a statistical learning problem that arises when the distribution of the source data used to train a model differs from that of the target data used to evaluate the model. While many DA algorithms have demonstrated considerable empirical success, blindly applying these algorithms can often lead to worse performance on new datasets. To address this, it is crucial to clarify the assumptions under which a DA algorithm has good target performance. In this work, we focus on the assumption of the presence of conditionally invariant components (CICs), which are relevant for prediction and remain conditionally invariant across the source and target data. We demonstrate that CICs, which can be estimated through conditional invariant penalty (CIP), play three prominent roles in providing target risk guarantees in DA. First, we propose a new algorithm based on CICs, importance-weighted conditional invariant penalty (IW-CIP), which has target risk guarantees beyond simple settings such as covariate shift and label shift. Second, we show that CICs help identify large discrepancies between source and target risks of other DA algorithms. Finally, we demonstrate that incorporating CICs into the domain invariant projection (DIP) algorithm can address its failure scenario caused by label-flip** features. We support our new algorithms and theoretical findings via numerical experiments on synthetic data, MNIST, CelebA, and Camelyon17 datasets. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2308.03215 [pdf, other]

The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning

Authors: Nikhil Ghosh, Spencer Frei, Wooseok Ha, Bin Yu

Abstract: In this work, we investigate the dynamics of stochastic gradient descent (SGD) when training a single-neuron autoencoder with linear or ReLU activation on orthogonal data. We show that for this non-convex problem, randomly initialized SGD with a constant step size successfully finds a global minimum for any batch size choice. However, the particular global minimum found depends upon the batch size… ▽ More In this work, we investigate the dynamics of stochastic gradient descent (SGD) when training a single-neuron autoencoder with linear or ReLU activation on orthogonal data. We show that for this non-convex problem, randomly initialized SGD with a constant step size successfully finds a global minimum for any batch size choice. However, the particular global minimum found depends upon the batch size. In the full-batch setting, we show that the solution is dense (i.e., not sparse) and is highly aligned with its initialized direction, showing that relatively little feature learning occurs. On the other hand, for any batch size strictly smaller than the number of samples, SGD finds a global minimum which is sparse and nearly orthogonal to its initialization, showing that the randomness of stochastic gradients induces a qualitatively different type of "feature selection" in this setting. Moreover, if we measure the sharpness of the minimum by the trace of the Hessian, the minima found with full batch gradient descent are flatter than those found with strictly smaller batch sizes, in contrast to previous works which suggest that large batches lead to sharper minima. To prove convergence of SGD with a constant step size, we introduce a powerful tool from the theory of non-homogeneous random walks which may be of independent interest. △ Less

Submitted 6 August, 2023; originally announced August 2023.

arXiv:2306.17347 [pdf, other]

Mediation with External Summary Statistic Information (MESSI)

Authors: Jonathan Boss, Wei Hao, Amber Cathey, Barrett M. Welch, Kelly K. Ferguson, John D. Meeker, Jian Kang, Bhramar Mukherjee

Abstract: Environmental health studies are increasingly measuring endogenous omics data ($\boldsymbol{M}$) to study intermediary biological pathways by which an exogenous exposure ($\boldsymbol{A}$) affects a health outcome ($\boldsymbol{Y}$), given confounders ($\boldsymbol{C}$). Mediation analysis is frequently carried out to understand such mechanisms. If intermediary pathways are of interest, then there… ▽ More Environmental health studies are increasingly measuring endogenous omics data ($\boldsymbol{M}$) to study intermediary biological pathways by which an exogenous exposure ($\boldsymbol{A}$) affects a health outcome ($\boldsymbol{Y}$), given confounders ($\boldsymbol{C}$). Mediation analysis is frequently carried out to understand such mechanisms. If intermediary pathways are of interest, then there is likely literature establishing statistical and biological significance of the total effect, defined as the effect of $\boldsymbol{A}$ on $\boldsymbol{Y}$ given $\boldsymbol{C}$. For mediation models with continuous outcomes and mediators, we show that leveraging external summary-level information on the total effect improves estimation efficiency of the natural direct and indirect effects. Moreover, the efficiency gain depends on the asymptotic partial $R^2$ between the outcome ($\boldsymbol{Y}\mid\boldsymbol{M},\boldsymbol{A},\boldsymbol{C}$) and total effect ($\boldsymbol{Y}\mid\boldsymbol{A},\boldsymbol{C}$) models, with smaller (larger) values benefiting direct (indirect) effect estimation. We robustify our estimation procedure to incongenial external information by assuming the total effect follows a random distribution. This framework allows shrinkage towards the external information if the total effects in the internal and external populations agree. We illustrate our methodology using data from the Puerto Rico Testsite for Exploring Contamination Threats, where Cytochrome p450 metabolites are hypothesized to mediate the effect of phthalate exposure on gestational age at delivery. External information on the total effect comes from a recently published pooled analysis of 16 studies. The proposed framework blends mediation analysis with emerging data integration techniques. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: 32 pages, 6 figures

arXiv:2203.13293 [pdf, other]

Methods for Large-scale Single Mediator Hypothesis Testing: Possible Choices and Comparisons

Authors: Jiacong Du, Xiang Zhou, Wei Hao, Yongmei Liu, Jennifer A. Smith, Bhramar Mukherjee

Abstract: Mediation hypothesis testing for a large number of mediators is challenging due to the composite structure of the null hypothesis, H0:alpha*beta=0 (alpha: effect of the exposure on the mediator after adjusting for confounders; beta: effect of the mediator on the outcome after adjusting for exposure and confounders). In this paper, we reviewed three classes of methods for multiple mediation hypothe… ▽ More Mediation hypothesis testing for a large number of mediators is challenging due to the composite structure of the null hypothesis, H0:alpha*beta=0 (alpha: effect of the exposure on the mediator after adjusting for confounders; beta: effect of the mediator on the outcome after adjusting for exposure and confounders). In this paper, we reviewed three classes of methods for multiple mediation hypothesis testing. In addition to these existing methods, we developed the Sobel-comp method, which uses a corrected mixture reference distribution for Sobel's test statistic. We performed extensive simulation studies to compare all six methods in terms of the false positive rates under the null hypothesis and the true positive rates under the alternative hypothesis. We found that the class of methods which uses a mixture reference distribution could best maintain the false positive rates at the nominal level under the null hypothesis and had the greatest true positive rates under the alternative hypothesis. We applied all methods to study the mediation mechanism of DNA methylation sites in the pathway from adult socioeconomic status to glycated hemoglobin level using data from the Multi-Ethnic Study of Atherosclerosis (MESA). We also provide guidelines for choosing the optimal mediation hypothesis testing method in practice. △ Less

Submitted 24 March, 2022; originally announced March 2022.

Comments: 24 pages, 6 figures, 4 tables

arXiv:2108.06847 [pdf, other]

Interpreting and improving deep-learning models with reality checks

Authors: Chandan Singh, Wooseok Ha, Bin Yu

Abstract: Recent deep-learning models have achieved impressive predictive performance by learning complex functions of many variables, often at the cost of interpretability. This chapter covers recent work aiming to interpret models by attributing importance to features and feature groups for a single prediction. Importantly, the proposed attributions assign importance to interactions between features, in a… ▽ More Recent deep-learning models have achieved impressive predictive performance by learning complex functions of many variables, often at the cost of interpretability. This chapter covers recent work aiming to interpret models by attributing importance to features and feature groups for a single prediction. Importantly, the proposed attributions assign importance to interactions between features, in addition to features in isolation. These attributions are shown to yield insights across real-world domains, including bio-imaging, cosmology image and natural-language processing. We then show how these attributions can be used to directly improve the generalization of a neural network or to distill it into a simple model. Throughout the chapter, we emphasize the use of reality checks to scrutinize the proposed interpretation techniques. △ Less

Submitted 18 August, 2021; v1 submitted 15 August, 2021; originally announced August 2021.

arXiv:2107.09145 [pdf, other]

Adaptive wavelet distillation from neural networks through interpretations

Authors: Wooseok Ha, Chandan Singh, Francois Lanusse, Srigokul Upadhyayula, Bin Yu

Abstract: Recent deep-learning models have achieved impressive prediction performance, but often sacrifice interpretability and computational efficiency. Interpretability is crucial in many disciplines, such as science and medicine, where models must be carefully vetted or where interpretation is the goal itself. Moreover, interpretable models are concise and often yield computational efficiency. Here, we p… ▽ More Recent deep-learning models have achieved impressive prediction performance, but often sacrifice interpretability and computational efficiency. Interpretability is crucial in many disciplines, such as science and medicine, where models must be carefully vetted or where interpretation is the goal itself. Moreover, interpretable models are concise and often yield computational efficiency. Here, we propose adaptive wavelet distillation (AWD), a method which aims to distill information from a trained neural network into a wavelet transform. Specifically, AWD penalizes feature attributions of a neural network in the wavelet domain to learn an effective multi-resolution wavelet transform. The resulting model is highly predictive, concise, computationally efficient, and has properties (such as a multi-scale structure) which make it easy to interpret. In close collaboration with domain experts, we showcase how AWD addresses challenges in two real-world settings: cosmological parameter inference and molecular-partner prediction. In both cases, AWD yields a scientifically interpretable and concise model which gives predictive performance better than state-of-the-art neural networks. Moreover, AWD identifies predictive features that are scientifically meaningful in the context of respective domains. All code and models are released in a full-fledged package available on Github (https://github.com/Yu-Group/adaptive-wavelets). △ Less

Submitted 26 August, 2021; v1 submitted 19 July, 2021; originally announced July 2021.

arXiv:2104.13417 [pdf, other]

Towards Fair Federated Learning with Zero-Shot Data Augmentation

Authors: Weituo Hao, Mostafa El-Khamy, Jungwon Lee, Jianyi Zhang, Kevin J Liang, Changyou Chen, Lawrence Carin

Abstract: Federated learning has emerged as an important distributed learning paradigm, where a server aggregates a global model from many client-trained models while having no access to the client data. Although it is recognized that statistical heterogeneity of the client local data yields slower global model convergence, it is less commonly recognized that it also yields a biased federated global model w… ▽ More Federated learning has emerged as an important distributed learning paradigm, where a server aggregates a global model from many client-trained models while having no access to the client data. Although it is recognized that statistical heterogeneity of the client local data yields slower global model convergence, it is less commonly recognized that it also yields a biased federated global model with a high variance of accuracy across clients. In this work, we aim to provide federated learning schemes with improved fairness. To tackle this challenge, we propose a novel federated learning system that employs zero-shot data augmentation on under-represented data to mitigate statistical heterogeneity and encourage more uniform accuracy performance across clients in federated networks. We study two variants of this scheme, Fed-ZDAC (federated learning with zero-shot data augmentation at the clients) and Fed-ZDAS (federated learning with zero-shot data augmentation at the server). Empirical results on a suite of datasets demonstrate the effectiveness of our methods on simultaneously improving the test accuracy and fairness. △ Less

Submitted 27 April, 2021; originally announced April 2021.

Comments: Accepted by IEEE CVPR Workshop on Fair, Data Efficient And Trusted Computer Vision

arXiv:2011.00593 [pdf, other]

MixKD: Towards Efficient Distillation of Large-scale Language Models

Authors: Kevin J Liang, Weituo Hao, Dinghan Shen, Yufan Zhou, Weizhu Chen, Changyou Chen, Lawrence Carin

Abstract: Large-scale language models have recently demonstrated impressive empirical performance. Nevertheless, the improved results are attained at the price of bigger models, more power consumption, and slower inference, which hinder their applicability to low-resource (both memory and computation) platforms. Knowledge distillation (KD) has been demonstrated as an effective framework for compressing such… ▽ More Large-scale language models have recently demonstrated impressive empirical performance. Nevertheless, the improved results are attained at the price of bigger models, more power consumption, and slower inference, which hinder their applicability to low-resource (both memory and computation) platforms. Knowledge distillation (KD) has been demonstrated as an effective framework for compressing such big models. However, large-scale neural network systems are prone to memorize training instances, and thus tend to make inconsistent predictions when the data distribution is altered slightly. Moreover, the student model has few opportunities to request useful information from the teacher model when there is limited task-specific data available. To address these issues, we propose MixKD, a data-agnostic distillation framework that leverages mixup, a simple yet efficient data augmentation approach, to endow the resulting model with stronger generalization ability. Concretely, in addition to the original training examples, the student model is encouraged to mimic the teacher's behavior on the linear interpolation of example pairs as well. We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the generalization error and the empirical error. To verify its effectiveness, we conduct experiments on the GLUE benchmark, where MixKD consistently leads to significant gains over the standard KD training, and outperforms several competitive baselines. Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach. △ Less

Submitted 17 March, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

Comments: ICLR 2021 Camera Ready

arXiv:2008.05687 [pdf, other]

WAFFLe: Weight Anonymized Factorization for Federated Learning

Authors: Weituo Hao, Nikhil Mehta, Kevin J Liang, Pengyu Cheng, Mostafa El-Khamy, Lawrence Carin

Abstract: In domains where data are sensitive or private, there is great value in methods that can learn in a distributed manner without the data ever leaving the local devices. In light of this need, federated learning has emerged as a popular training paradigm. However, many federated learning approaches trade transmitting data for communicating updated weight parameters for each local device. Therefore,… ▽ More In domains where data are sensitive or private, there is great value in methods that can learn in a distributed manner without the data ever leaving the local devices. In light of this need, federated learning has emerged as a popular training paradigm. However, many federated learning approaches trade transmitting data for communicating updated weight parameters for each local device. Therefore, a successful breach that would have otherwise directly compromised the data instead grants whitebox access to the local model, which opens the door to a number of attacks, including exposing the very data federated learning seeks to protect. Additionally, in distributed scenarios, individual client devices commonly exhibit high statistical heterogeneity. Many common federated approaches learn a single global model; while this may do well on average, performance degrades when the i.i.d. assumption is violated, underfitting individuals further from the mean, and raising questions of fairness. To address these issues, we propose Weight Anonymized Factorization for Federated Learning (WAFFLe), an approach that combines the Indian Buffet Process with a shared dictionary of weight factors for neural networks. Experiments on MNIST, FashionMNIST, and CIFAR-10 demonstrate WAFFLe's significant improvement to local test performance and fairness while simultaneously providing an extra layer of security. △ Less

Submitted 13 August, 2020; originally announced August 2020.

arXiv:2006.12013 [pdf, other]

CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information

Authors: Pengyu Cheng, Weituo Hao, Shuyang Dai, Jiachang Liu, Zhe Gan, Lawrence Carin

Abstract: Mutual information (MI) minimization has gained considerable interests in various machine learning tasks. However, estimating and minimizing MI in high-dimensional spaces remains a challenging problem, especially when only samples, rather than distribution forms, are accessible. Previous works mainly focus on MI lower bound approximation, which is not applicable to MI minimization problems. In thi… ▽ More Mutual information (MI) minimization has gained considerable interests in various machine learning tasks. However, estimating and minimizing MI in high-dimensional spaces remains a challenging problem, especially when only samples, rather than distribution forms, are accessible. Previous works mainly focus on MI lower bound approximation, which is not applicable to MI minimization problems. In this paper, we propose a novel Contrastive Log-ratio Upper Bound (CLUB) of mutual information. We provide a theoretical analysis of the properties of CLUB and its variational approximation. Based on this upper bound, we introduce a MI minimization training scheme and further accelerate it with a negative sampling strategy. Simulation studies on Gaussian distributions show the reliable estimation ability of CLUB. Real-world MI minimization experiments, including domain adaptation and information bottleneck, demonstrate the effectiveness of the proposed method. The code is at https://github.com/Linear95/CLUB. △ Less

Submitted 23 July, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

Comments: Accepted by the 37th International Conference on Machine Learing (ICML2020)

arXiv:2006.09543 [pdf, other]

Data Driven Control with Learned Dynamics: Model-Based versus Model-Free Approach

Authors: Wenjian Hao, Yiqiang Han

Abstract: This paper compares two different types of data-driven control methods, representing model-based and model-free approaches. One is a recently proposed method - Deep Koopman Representation for Control (DKRC), which utilizes a deep neural network to map an unknown nonlinear dynamical system to a high-dimensional linear system, which allows for employing state-of-the-art control strategy. The other o… ▽ More This paper compares two different types of data-driven control methods, representing model-based and model-free approaches. One is a recently proposed method - Deep Koopman Representation for Control (DKRC), which utilizes a deep neural network to map an unknown nonlinear dynamical system to a high-dimensional linear system, which allows for employing state-of-the-art control strategy. The other one is a classic model-free control method based on an actor-critic architecture - Deep Deterministic Policy Gradient (DDPG), which has been proved to be effective in various dynamical systems. The comparison is carried out in OpenAI Gym, which provides multiple control environments for benchmark purposes. Two examples are provided for comparison, i.e., classic Inverted Pendulum and Lunar Lander Continuous Control. From the results of the experiments, we compare these two methods in terms of control strategies and the effectiveness under various initialization conditions. We also examine the learned dynamic model from DKRC with the analytical model derived from the Euler-Lagrange Linearization method, which demonstrates the accuracy in the learned model for unknown dynamics from a data-driven sample-efficient approach. △ Less

Submitted 16 June, 2020; originally announced June 2020.

Comments: 17 pages, 16 figures

arXiv:2003.01926 [pdf, other]

Transformation Importance with Applications to Cosmology

Authors: Chandan Singh, Wooseok Ha, Francois Lanusse, Vanessa Boehm, Jia Liu, Bin Yu

Abstract: Machine learning lies at the heart of new possibilities for scientific discovery, knowledge generation, and artificial intelligence. Its potential benefits to these fields requires going beyond predictive accuracy and focusing on interpretability. In particular, many scientific problems require interpretations in a domain-specific interpretable feature space (e.g. the frequency domain) whereas att… ▽ More Machine learning lies at the heart of new possibilities for scientific discovery, knowledge generation, and artificial intelligence. Its potential benefits to these fields requires going beyond predictive accuracy and focusing on interpretability. In particular, many scientific problems require interpretations in a domain-specific interpretable feature space (e.g. the frequency domain) whereas attributions to the raw features (e.g. the pixel space) may be unintelligible or even misleading. To address this challenge, we propose TRIM (TRansformation IMportance), a novel approach which attributes importances to features in a transformed space and can be applied post-hoc to a fully trained model. TRIM is motivated by a cosmological parameter estimation problem using deep neural networks (DNNs) on simulated data, but it is generally applicable across domains/models and can be combined with any local interpretation method. In our cosmology example, combining TRIM with contextual decomposition shows promising results for identifying which frequencies a DNN uses, hel** cosmologists to understand and validate that the model learns appropriate physical features rather than simulation artifacts. △ Less

Submitted 14 June, 2021; v1 submitted 4 March, 2020; originally announced March 2020.

Comments: Published in ICLR 2020 Workshop on Fundamental Science in the era of AI

arXiv:1906.04863 [pdf, other]

Statistical guarantees for local graph clustering

Authors: Wooseok Ha, Kimon Fountoulakis, Michael W. Mahoney

Abstract: Local graph clustering methods aim to find small clusters in very large graphs. These methods take as input a graph and a seed node, and they return as output a good cluster in a running time that depends on the size of the output cluster but that is independent of the size of the input graph. In this paper, we adopt a statistical perspective on local graph clustering, and we analyze the performan… ▽ More Local graph clustering methods aim to find small clusters in very large graphs. These methods take as input a graph and a seed node, and they return as output a good cluster in a running time that depends on the size of the output cluster but that is independent of the size of the input graph. In this paper, we adopt a statistical perspective on local graph clustering, and we analyze the performance of the l1-regularized PageRank method~(Fountoulakis et. al.) for the recovery of a single target cluster, given a seed node inside the cluster. Assuming the target cluster has been generated by a random model, we present two results. In the first, we show that the optimal support of l1-regularized PageRank recovers the full target cluster, with bounded false positives. In the second, we show that if the seed node is connected solely to the target cluster then the optimal support of l1-regularized PageRank recovers exactly the target cluster. We also show empirically that l1-regularized PageRank has a state-of-the-art performance on many real graphs, demonstrating the superiority of the method. From a computational perspective, we show that the solution path of l1-regularized PageRank is monotonic. This allows for the application of the forward stagewise algorithm, which approximates the solution path in running time that does not depend on the size of the whole graph. Finally, we show that l1-regularized PageRank and approximate personalized PageRank (APPR), another very popular method for local graph clustering, are equivalent in the sense that we can lower and upper bound the output of one with the output of the other. Based on this relation, we establish for APPR similar results to those we establish for l1-regularized PageRank. △ Less

Submitted 10 January, 2020; v1 submitted 11 June, 2019; originally announced June 2019.

Comments: 52 pages, 4 figures, 8 tables

arXiv:1903.03712 [pdf, other]

Adaptive Power System Emergency Control using Deep Reinforcement Learning

Authors: Qiuhua Huang, Renke Huang, Weituo Hao, Jie Tan, Rui Fan, Zhenyu Huang

Abstract: Power system emergency control is generally regarded as the last safety net for grid security and resiliency. Existing emergency control schemes are usually designed off-line based on either the conceived "worst" case scenario or a few typical operation scenarios. These schemes are facing significant adaptiveness and robustness issues as increasing uncertainties and variations occur in modern elec… ▽ More Power system emergency control is generally regarded as the last safety net for grid security and resiliency. Existing emergency control schemes are usually designed off-line based on either the conceived "worst" case scenario or a few typical operation scenarios. These schemes are facing significant adaptiveness and robustness issues as increasing uncertainties and variations occur in modern electrical grids. To address these challenges, for the first time, this paper developed novel adaptive emergency control schemes using deep reinforcement learning (DRL), by leveraging the high-dimensional feature extraction and non-linear generalization capabilities of DRL for complex power systems. Furthermore, an open-source platform named RLGC has been designed for the first time to assist the development and benchmarking of DRL algorithms for power system control. Details of the platform and DRL-based emergency control schemes for generator dynamic braking and under-voltage load shedding are presented. Extensive case studies performed in both two-area four-machine system and IEEE 39-Bus system have demonstrated the excellent performance and robustness of the proposed schemes. △ Less

Submitted 22 April, 2019; v1 submitted 8 March, 2019; originally announced March 2019.

Comments: 12 pages

arXiv:1712.01995 [pdf]

Short-Term Prediction of Signal Cycle in Actuated-Controlled Corridor Using Sparse Time Series Models

Authors: Bahman Moghimi, Abolfazl Safikhani, Camille Kamga, Wei Hao, JiaQi Ma

Abstract: Traffic signals as part of intelligent transportation systems can play a significant role toward making cities smart. Conventionally, most traffic lights are designed with fixed-time control, which induces a lot of slack time (unused green time). Actuated traffic lights control traffic flow in real time and are more responsive to the variation of traffic demands. For an isolated signal, a family o… ▽ More Traffic signals as part of intelligent transportation systems can play a significant role toward making cities smart. Conventionally, most traffic lights are designed with fixed-time control, which induces a lot of slack time (unused green time). Actuated traffic lights control traffic flow in real time and are more responsive to the variation of traffic demands. For an isolated signal, a family of time series models such as autoregressive integrated moving average (ARIMA) models can be beneficial for predicting the next cycle length. However, when there are multiple signals placed along a corridor with different spacing and configurations, the cycle length variation of such signals is not just related to each signal's values, but it is also affected by the platoon of vehicles coming from neighboring intersections. In this paper, a multivariate time series model is developed to analyze the behavior of signal cycle lengths of multiple intersections placed along a corridor in a fully actuated setup. Five signalized intersections have been modeled along a corridor, with different spacing among them, together with multiple levels of traffic demand. To tackle the high-dimensional nature of the problem, penalized least squares method are utilized in the estimation procedure to output sparse models. Two proposed sparse time series methods captured the signal data reasonably well, and outperformed the conventional vector autoregressive (VAR) model - in some cases up to 17% - as well as being more powerful than univariate models such as ARIMA. △ Less

Submitted 18 March, 2018; v1 submitted 5 December, 2017; originally announced December 2017.

arXiv:1709.04451 [pdf, other]

Alternating minimization and alternating descent over nonconvex sets

Authors: Wooseok Ha, Rina Foygel Barber

Abstract: We analyze the performance of alternating minimization for loss functions optimized over two variables, where each variable may be restricted to lie in some potentially nonconvex constraint set. This type of setting arises naturally in high-dimensional statistics and signal processing, where the variables often reflect different structures or components within the signals being considered. Our ana… ▽ More We analyze the performance of alternating minimization for loss functions optimized over two variables, where each variable may be restricted to lie in some potentially nonconvex constraint set. This type of setting arises naturally in high-dimensional statistics and signal processing, where the variables often reflect different structures or components within the signals being considered. Our analysis relies on the notion of local concavity coefficients, which has been proposed in Barber and Ha to measure and quantify the concavity of a general nonconvex set. Our results further reveal important distinctions between alternating and non-alternating methods. Since computing the alternating minimization steps may not be tractable for some problems, we also consider an inexact version of the algorithm and provide a set of sufficient conditions to ensure fast convergence of the inexact algorithms. We demonstrate our framework on several examples, including low rank + sparse decomposition and multitask regression, and provide numerical experiments to validate our theoretical results. △ Less

Submitted 25 February, 2019; v1 submitted 13 September, 2017; originally announced September 2017.

arXiv:1312.2041 [pdf, other]

doi 10.1093/bioinformatics/btv641

Probabilistic models of genetic variation in structured populations applied to global human studies

Authors: Wei Hao, Minsun Song, John D. Storey

Abstract: Modern population genetics studies typically involve genome-wide genoty** of individuals from a diverse network of ancestries. An important, unsolved problem is how to formulate and estimate probabilistic models of observed genotypes that allow for complex population structure. We formulate two general probabilistic models, and we propose computationally efficient algorithms to estimate them. Fi… ▽ More Modern population genetics studies typically involve genome-wide genoty** of individuals from a diverse network of ancestries. An important, unsolved problem is how to formulate and estimate probabilistic models of observed genotypes that allow for complex population structure. We formulate two general probabilistic models, and we propose computationally efficient algorithms to estimate them. First, we show how principal component analysis (PCA) can be utilized to estimate a general model that includes the well-known Pritchard-Stephens-Donnelly mixed-membership model as a special case. Noting some drawbacks of this approach, we introduce a new "logistic factor analysis" (LFA) framework that seeks to directly model the logit transformation of probabilities underlying observed genotypes in terms of latent variables that capture population structure. We demonstrate these advances on data from the human genome diversity panel and 1000 genomes project, where we are able to identify SNPs that are highly differentiated with respect to structure while making minimal modeling assumptions. △ Less

Submitted 3 March, 2015; v1 submitted 6 December, 2013; originally announced December 2013.

Comments: Wei Hao and Minsun Song contributed equally to this work

Showing 1–19 of 19 results for author: Ha, W