Search | arXiv e-print repository

Winner's Curse Free Robust Mendelian Randomization with Summary Data

Authors: Zhongming Xie, Wanheng Zhang, **gshen Wang, Chong Wu

Abstract: In the past decade, the increased availability of genome-wide association studies summary data has popularized Mendelian Randomization (MR) for conducting causal inference. MR analyses, incorporating genetic variants as instrumental variables, are known for their robustness against reverse causation bias and unmeasured confounders. Nevertheless, classical MR analyses utilizing summary data may sti… ▽ More In the past decade, the increased availability of genome-wide association studies summary data has popularized Mendelian Randomization (MR) for conducting causal inference. MR analyses, incorporating genetic variants as instrumental variables, are known for their robustness against reverse causation bias and unmeasured confounders. Nevertheless, classical MR analyses utilizing summary data may still produce biased causal effect estimates due to the winner's curse and pleiotropic issues. To address these two issues and establish valid causal conclusions, we propose a unified robust Mendelian Randomization framework with summary data, which systematically removes the winner's curse and screens out invalid genetic instruments with pleiotropic effects. Different from existing robust MR literature, our framework delivers valid statistical inference on the causal effect neither requiring the genetic pleiotropy effects to follow any parametric distribution nor relying on perfect instrument screening property. Under appropriate conditions, we show that our proposed estimator converges to a normal distribution and its variance can be well estimated. We demonstrate the performance of our proposed estimator through Monte Carlo simulations and two case studies. The codes implementing the procedures are available at https://github.com/ChongWuLab/CARE/. △ Less

Submitted 10 September, 2023; originally announced September 2023.

arXiv:2307.05592 [pdf, other]

doi 10.1016/j.cma.2023.116721

Functional PCA and Deep Neural Networks-based Bayesian Inverse Uncertainty Quantification with Transient Experimental Data

Authors: Ziyu Xie, Mahmoud Yaseen, Xu Wu

Abstract: Inverse UQ is the process to inversely quantify the model input uncertainties based on experimental data. This work focuses on develo** an inverse UQ process for time-dependent responses, using dimensionality reduction by functional principal component analysis (PCA) and deep neural network (DNN)-based surrogate models. The demonstration is based on the inverse UQ of TRACE physical model paramet… ▽ More Inverse UQ is the process to inversely quantify the model input uncertainties based on experimental data. This work focuses on develo** an inverse UQ process for time-dependent responses, using dimensionality reduction by functional principal component analysis (PCA) and deep neural network (DNN)-based surrogate models. The demonstration is based on the inverse UQ of TRACE physical model parameters using the FEBA transient experimental data. The measurement data is time-dependent peak cladding temperature (PCT). Since the quantity-of-interest (QoI) is time-dependent that corresponds to infinite-dimensional responses, PCA is used to reduce the QoI dimension while preserving the transient profile of the PCT, in order to make the inverse UQ process more efficient. However, conventional PCA applied directly to the PCT time series profiles can hardly represent the data precisely due to the sudden temperature drop at the time of quenching. As a result, a functional alignment method is used to separate the phase and amplitude information of the transient PCT profiles before dimensionality reduction. DNNs are then trained using PC scores from functional PCA to build surrogate models of TRACE in order to reduce the computational cost in Markov Chain Monte Carlo sampling. Bayesian neural networks are used to estimate the uncertainties of DNN surrogate model predictions. In this study, we compared four different inverse UQ processes with different dimensionality reduction methods and surrogate models. The proposed approach shows an improvement in reducing the dimension of the TRACE transient simulations, and the forward propagation of inverse UQ results has a better agreement with the experimental data. △ Less

Submitted 10 July, 2023; originally announced July 2023.

Comments: 31 pages, 21 figures

arXiv:2303.18067 [pdf, other]

Rediscover Climate Change during Global Warming Slowdown via Wasserstein Stability Analysis

Authors: Zhiang Xie, Dongwei Chen, Puxi Li

Abstract: Climate change is one of the key topics in climate science. However, previous research has predominantly concentrated on changes in mean values, and few research examines changes in Probability Distribution Function (PDF). In this study, a novel method called Wasserstein Stability Analysis (WSA) is developed to identify PDF changes, especially the extreme event shift and non-linear physical value… ▽ More Climate change is one of the key topics in climate science. However, previous research has predominantly concentrated on changes in mean values, and few research examines changes in Probability Distribution Function (PDF). In this study, a novel method called Wasserstein Stability Analysis (WSA) is developed to identify PDF changes, especially the extreme event shift and non-linear physical value constraint variation in climate change. WSA is applied to 21st-century warming slowdown period and is compared with traditional mean-value trend analysis. The result indicates that despite no significant trend, the central-eastern Pacific experienced a decline in hot extremes and an increase in cold extremes, indicating a La Nina-like temperature shift. Further analysis at two Arctic locations suggests sea ice severely restricts the hot extremes of surface air temperature. This impact is diminishing as sea ice melts. Overall, based on detecting PDF changes, WSA is a useful method for re-discovering climate change. △ Less

Submitted 28 May, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

Comments: 14 pages, 4 figures, 1 Algorithm, and 3-page supplementary materials

arXiv:2212.02083 [pdf, other]

On the Overlooked Structure of Stochastic Gradients

Authors: Zeke Xie, Qian-Yuan Tang, Mingming Sun, ** Li

Abstract: Stochastic gradients closely relate to both optimization and generalization of deep neural networks (DNNs). Some works attempted to explain the success of stochastic optimization for deep learning by the arguably heavy-tail properties of gradient noise, while other works presented theoretical and empirical evidence against the heavy-tail hypothesis on gradient noise. Unfortunately, formal statisti… ▽ More Stochastic gradients closely relate to both optimization and generalization of deep neural networks (DNNs). Some works attempted to explain the success of stochastic optimization for deep learning by the arguably heavy-tail properties of gradient noise, while other works presented theoretical and empirical evidence against the heavy-tail hypothesis on gradient noise. Unfortunately, formal statistical tests for analyzing the structure and heavy tails of stochastic gradients in deep learning are still under-explored. In this paper, we mainly make two contributions. First, we conduct formal statistical tests on the distribution of stochastic gradients and gradient noise across both parameters and iterations. Our statistical tests reveal that dimension-wise gradients usually exhibit power-law heavy tails, while iteration-wise gradients and stochastic gradient noise caused by minibatch training usually do not exhibit power-law heavy tails. Second, we further discover that the covariance spectra of stochastic gradients have the power-law structures overlooked by previous studies and present its theoretical implications for training of DNNs. While previous studies believed that the anisotropic structure of stochastic gradients matters to deep learning, they did not expect the gradient covariance can have such an elegant mathematical structure. Our work challenges the existing belief and provides novel insights on the structure of stochastic gradients in deep learning. △ Less

Submitted 20 October, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

Comments: NeurIPS 2023. 20 pages, 16 figures, 17 Tables; Key Words: Deep Learning, Stochastic Gradient, Optimization. arXiv admin note: text overlap with arXiv:2201.13011

arXiv:2208.07959 [pdf, other]

Variable Selection in Latent Regression IRT Models via Knockoffs: An Application to International Large-scale Assessment in Education

Authors: Zilong Xie, Yunxiao Chen, Matthias von Davier, Haolei Weng

Abstract: International large-scale assessments (ILSAs) play an important role in educational research and policy making. They collect valuable data on education quality and performance development across many education systems, giving countries the opportunity to share techniques, organizational structures, and policies that have proven efficient and successful. To gain insights from ILSA data, we identify… ▽ More International large-scale assessments (ILSAs) play an important role in educational research and policy making. They collect valuable data on education quality and performance development across many education systems, giving countries the opportunity to share techniques, organizational structures, and policies that have proven efficient and successful. To gain insights from ILSA data, we identify non-cognitive variables associated with students' academic performance. This problem has three analytical challenges: 1) academic performance is measured by cognitive items under a matrix sampling design; 2) there are many missing values in the non-cognitive variables; and 3) multiple comparisons due to a large number of non-cognitive variables. We consider an application to the Programme for International Student Assessment (PISA), aiming to identify non-cognitive variables associated with students' performance in science. We formulate it as a variable selection problem under a general latent variable model framework and further propose a knockoff method that conducts variable selection with a controlled error rate for false selections. △ Less

Submitted 14 November, 2023; v1 submitted 16 August, 2022; originally announced August 2022.

arXiv:2207.02943 [pdf, other]

Degrees of Freedom and Information Criteria for the Synthetic Control Method

Authors: Guillaume Allaire Pouliot, Zhen Xie

Abstract: We provide an analytical characterization of the model flexibility of the synthetic control method (SCM) in the familiar form of degrees of freedom. We obtain estimable information criteria. These may be used to circumvent cross-validation when selecting either the weighting matrix in the SCM with covariates, or the tuning parameter in model averaging or penalized variants of SCM. We assess the im… ▽ More We provide an analytical characterization of the model flexibility of the synthetic control method (SCM) in the familiar form of degrees of freedom. We obtain estimable information criteria. These may be used to circumvent cross-validation when selecting either the weighting matrix in the SCM with covariates, or the tuning parameter in model averaging or penalized variants of SCM. We assess the impact of car license rationing in Tian** and make a novel use of SCM; while a natural match is available, it and other donors are noisy, inviting the use of SCM to average over approximately matching donors. The very large number of candidate donors calls for model averaging or penalized variants of SCM and, with short pre-treatment series, model selection per information criteria outperforms that per cross-validation. △ Less

Submitted 6 July, 2022; originally announced July 2022.

arXiv:2105.05370 [pdf, other]

doi 10.1016/j.anucene.2021.108782

Bayesian Inverse Uncertainty Quantification of a MOOSE-based Melt Pool Model for Additive Manufacturing Using Experimental Data

Authors: Ziyu Xie, Wen Jiang, Congjian Wang, Xu Wu

Abstract: Additive manufacturing (AM) technology is being increasingly adopted in a wide variety of application areas due to its ability to rapidly produce, prototype, and customize designs. AM techniques afford significant opportunities in regard to nuclear materials, including an accelerated fabrication process and reduced cost. High-fidelity modeling and simulation (M\&S) of AM processes is being develop… ▽ More Additive manufacturing (AM) technology is being increasingly adopted in a wide variety of application areas due to its ability to rapidly produce, prototype, and customize designs. AM techniques afford significant opportunities in regard to nuclear materials, including an accelerated fabrication process and reduced cost. High-fidelity modeling and simulation (M\&S) of AM processes is being developed in Idaho National Laboratory (INL)'s Multiphysics Object-Oriented Simulation Environment (MOOSE) to support AM process optimization and provide a fundamental understanding of the various physical interactions involved. In this paper, we employ Bayesian inverse uncertainty quantification (UQ) to quantify the input uncertainties in a MOOSE-based melt pool model for AM. Inverse UQ is the process of inversely quantifying the input uncertainties while kee** model predictions consistent with the measurement data. The inverse UQ process takes into account uncertainties from the model, code, and data while simultaneously characterizing the uncertain distributions in the input parameters--rather than merely providing best-fit point estimates. We employ measurement data on melt pool geometry (lengths and depths) to quantify the uncertainties in several melt pool model parameters. Simulation results using the posterior uncertainties have shown improved agreement with experimental data, as compared to those using the prior nominal values. The resulting parameter uncertainties can be used to replace expert opinions in future uncertainty, sensitivity, and validation studies. △ Less

Submitted 17 May, 2021; v1 submitted 11 May, 2021; originally announced May 2021.

Comments: 26 pages, 11 figures

arXiv:2105.00553 [pdf, other]

doi 10.1016/j.nucengdes.2021.111423

Towards Improving the Predictive Capability of Computer Simulations by Integrating Inverse Uncertainty Quantification and Quantitative Validation with Bayesian Hypothesis Testing

Authors: Ziyu Xie, Farah Alsafadi, Xu Wu

Abstract: The Best Estimate plus Uncertainty (BEPU) approach for nuclear systems modeling and simulation requires that the prediction uncertainty must be quantified in order to prove that the investigated design stays within acceptance criteria. A rigorous Uncertainty Quantification (UQ) process should simultaneously consider multiple sources of quantifiable uncertainties: (1) parameter uncertainty due to r… ▽ More The Best Estimate plus Uncertainty (BEPU) approach for nuclear systems modeling and simulation requires that the prediction uncertainty must be quantified in order to prove that the investigated design stays within acceptance criteria. A rigorous Uncertainty Quantification (UQ) process should simultaneously consider multiple sources of quantifiable uncertainties: (1) parameter uncertainty due to randomness or lack of knowledge; (2) experimental uncertainty due to measurement noise; (3) model uncertainty caused by missing/incomplete physics and numerical approximation errors, and (4) code uncertainty when surrogate models are used. In this paper, we propose a comprehensive framework to integrate results from inverse UQ and quantitative validation to provide robust predictions so that all these sources of uncertainties can be taken into consideration. Inverse UQ quantifies the parameter uncertainties based on experimental data while taking into account uncertainties from model, code and measurement. In the validation step, we use a quantitative validation metric based on Bayesian hypothesis testing. The resulting metric, called the Bayes factor, is then used to form weighting factors to combine the prior and posterior knowledge of the parameter uncertainties in a Bayesian model averaging process. In this way, model predictions will be able to integrate the results from inverse UQ and validation to account for all available sources of uncertainties. This framework is a step towards addressing the ANS Nuclear Grand Challenge on "Simulation/Experimentation" by bridging the gap between models and data. △ Less

Submitted 2 May, 2021; originally announced May 2021.

Comments: 29 pages, 11 figures

arXiv:2104.12919 [pdf, other]

doi 10.1016/j.nucengdes.2021.111460

A Comprehensive Survey of Inverse Uncertainty Quantification of Physical Model Parameters in Nuclear System Thermal-Hydraulics Codes

Authors: Xu Wu, Ziyu Xie, Farah Alsafadi, Tomasz Kozlowski

Abstract: Uncertainty Quantification (UQ) is an essential step in computational model validation because assessment of the model accuracy requires a concrete, quantifiable measure of uncertainty in the model predictions. The concept of UQ in the nuclear community generally means forward UQ (FUQ), in which the information flow is from the inputs to the outputs. Inverse UQ (IUQ), in which the information flow… ▽ More Uncertainty Quantification (UQ) is an essential step in computational model validation because assessment of the model accuracy requires a concrete, quantifiable measure of uncertainty in the model predictions. The concept of UQ in the nuclear community generally means forward UQ (FUQ), in which the information flow is from the inputs to the outputs. Inverse UQ (IUQ), in which the information flow is from the model outputs and experimental data to the inputs, is an equally important component of UQ but has been significantly underrated until recently. FUQ requires knowledge in the input uncertainties which has been specified by expert opinion or user self-evaluation. IUQ is defined as the process to inversely quantify the input uncertainties based on experimental data. This review paper aims to provide a comprehensive and comparative discussion of the major aspects of the IUQ methodologies that have been used on the physical models in system thermal-hydraulics codes. IUQ methods can be categorized by three main groups: frequentist (deterministic), Bayesian (probabilistic), and empirical (design-of-experiments). We used eight metrics to evaluate an IUQ method, including solidity, complexity, accessibility, independence, flexibility, comprehensiveness, transparency, and tractability. Twelve IUQ methods are reviewed, compared, and evaluated based on these eight metrics. Such comparative evaluation will provide a good guidance for users to select a proper IUQ method based on the IUQ problem under investigation. △ Less

Submitted 26 April, 2021; originally announced April 2021.

Comments: 76 pages, 10 figures

arXiv:2010.13520 [pdf, other]

Differentially Private (Gradient) Expectation Maximization Algorithm with Statistical Guarantees

Authors: Di Wang, Jiahao Ding, Lijie Hu, Zejun Xie, Miao Pan, **hui Xu

Abstract: (Gradient) Expectation Maximization (EM) is a widely used algorithm for estimating the maximum likelihood of mixture models or incomplete data problems. A major challenge facing this popular technique is how to effectively preserve the privacy of sensitive data. Previous research on this problem has already lead to the discovery of some Differentially Private (DP) algorithms for (Gradient) EM. How… ▽ More (Gradient) Expectation Maximization (EM) is a widely used algorithm for estimating the maximum likelihood of mixture models or incomplete data problems. A major challenge facing this popular technique is how to effectively preserve the privacy of sensitive data. Previous research on this problem has already lead to the discovery of some Differentially Private (DP) algorithms for (Gradient) EM. However, unlike in the non-private case, existing techniques are not yet able to provide finite sample statistical guarantees. To address this issue, we propose in this paper the first DP version of (Gradient) EM algorithm with statistical guarantees. Moreover, we apply our general framework to three canonical models: Gaussian Mixture Model (GMM), Mixture of Regressions Model (MRM) and Linear Regression with Missing Covariates (RMC). Specifically, for GMM in the DP model, our estimation error is near optimal in some cases. For the other two models, we provide the first finite sample statistical guarantees. Our theory is supported by thorough numerical experiments. △ Less

Submitted 16 January, 2022; v1 submitted 21 October, 2020; originally announced October 2020.

Comments: Submiited. arXiv admin note: text overlap with arXiv:2010.09576

arXiv:2009.11469 [pdf, other]

Revisiting Graph Convolutional Network on Semi-Supervised Node Classification from an Optimization Perspective

Authors: Hongwei Zhang, Ti** Yan, Zenjun Xie, Yuanqing Xia, Yuan Zhang

Abstract: Graph convolutional networks (GCNs) have achieved promising performance on various graph-based tasks. However they suffer from over-smoothing when stacking more layers. In this paper, we present a quantitative study on this observation and develop novel insights towards the deeper GCN. First, we interpret the current graph convolutional operations from an optimization perspective and argue that ov… ▽ More Graph convolutional networks (GCNs) have achieved promising performance on various graph-based tasks. However they suffer from over-smoothing when stacking more layers. In this paper, we present a quantitative study on this observation and develop novel insights towards the deeper GCN. First, we interpret the current graph convolutional operations from an optimization perspective and argue that over-smoothing is mainly caused by the naive first-order approximation of the solution to the optimization problem. Subsequently, we introduce two metrics to measure the over-smoothing on node-level tasks. Specifically, we calculate the fraction of the pairwise distance between connected and disconnected nodes to the overall distance respectively. Based on our theoretical and empirical analysis, we establish a universal theoretical framework of GCN from an optimization perspective and derive a novel convolutional kernel named GCN+ which has lower parameter amount while relieving the over-smoothing inherently. Extensive experiments on real-world datasets demonstrate the superior performance of GCN+ over state-of-the-art baseline methods on the node classification tasks. △ Less

Submitted 24 September, 2020; v1 submitted 23 September, 2020; originally announced September 2020.

arXiv:2008.11832 [pdf, other]

doi 10.1145/3295500.3356147

Adaptive Neural Network-Based Approximation to Accelerate Eulerian Fluid Simulation

Authors: Wenqian Dong, Jie Liu, Zhen Xie, Dong Li

Abstract: The Eulerian fluid simulation is an important HPC application. The neural network has been applied to accelerate it. The current methods that accelerate the fluid simulation with neural networks lack flexibility and generalization. In this paper, we tackle the above limitation and aim to enhance the applicability of neural networks in the Eulerian fluid simulation. We introduce Smartfluidnet, a fr… ▽ More The Eulerian fluid simulation is an important HPC application. The neural network has been applied to accelerate it. The current methods that accelerate the fluid simulation with neural networks lack flexibility and generalization. In this paper, we tackle the above limitation and aim to enhance the applicability of neural networks in the Eulerian fluid simulation. We introduce Smartfluidnet, a framework that automates model generation and application. Given an existing neural network as input, Smartfluidnet generates multiple neural networks before the simulation to meet the execution time and simulation quality requirement. During the simulation, Smartfluidnet dynamically switches the neural networks to make the best efforts to reach the user requirement on simulation quality. Evaluating with 20,480 input problems, we show that Smartfluidnet achieves 1.46x and 590x speedup comparing with a state-of-the-art neural network model and the original fluid simulation respectively on an NVIDIA Titan X Pascal GPU, while providing better simulation quality than the state-of-the-art model. △ Less

Submitted 26 August, 2020; originally announced August 2020.

arXiv:2006.15815 [pdf, other]

Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum

Authors: Zeke Xie, Xinrui Wang, Huishuai Zhang, Issei Sato, Masashi Sugiyama

Abstract: Adaptive Moment Estimation (Adam), which combines Adaptive Learning Rate and Momentum, would be the most popular stochastic optimizer for accelerating the training of deep neural networks. However, it is empirically known that Adam often generalizes worse than Stochastic Gradient Descent (SGD). The purpose of this paper is to unveil the mystery of this behavior in the diffusion theoretical framewo… ▽ More Adaptive Moment Estimation (Adam), which combines Adaptive Learning Rate and Momentum, would be the most popular stochastic optimizer for accelerating the training of deep neural networks. However, it is empirically known that Adam often generalizes worse than Stochastic Gradient Descent (SGD). The purpose of this paper is to unveil the mystery of this behavior in the diffusion theoretical framework. Specifically, we disentangle the effects of Adaptive Learning Rate and Momentum of the Adam dynamics on saddle-point esca** and flat minima selection. We prove that Adaptive Learning Rate can escape saddle points efficiently, but cannot select flat minima as SGD does. In contrast, Momentum provides a drift effect to help the training process pass through saddle points, and almost does not affect flat minima selection. This partly explains why SGD (with Momentum) generalizes better, while Adam generalizes worse but converges faster. Furthermore, motivated by the analysis, we design a novel adaptive optimization framework named Adaptive Inertia, which uses parameter-wise adaptive inertia to accelerate the training and provably favors flat minima as well as SGD. Our extensive experiments demonstrate that the proposed adaptive inertia method can generalize significantly better than SGD and conventional adaptive gradient methods. △ Less

Submitted 14 June, 2022; v1 submitted 29 June, 2020; originally announced June 2020.

Comments: ICML2022, Long Oral Presentation, 30 pages, 14 figures, Key Words: Deep Learning Theory, Optimization, Adam, Adaptive Inertia, Flat Minima

arXiv:2005.08704 [pdf, other]

A Biologically Inspired Feature Enhancement Framework for Zero-Shot Learning

Authors: Zhongwu Xie, Weipeng Cao, Xizhao Wang, Zhong Ming, **g**g Zhang, Jiyong Zhang

Abstract: Most of the Zero-Shot Learning (ZSL) algorithms currently use pre-trained models as their feature extractors, which are usually trained on the ImageNet data set by using deep neural networks. The richness of the feature information embedded in the pre-trained models can help the ZSL model extract more useful features from its limited training samples. However, sometimes the difference between the… ▽ More Most of the Zero-Shot Learning (ZSL) algorithms currently use pre-trained models as their feature extractors, which are usually trained on the ImageNet data set by using deep neural networks. The richness of the feature information embedded in the pre-trained models can help the ZSL model extract more useful features from its limited training samples. However, sometimes the difference between the training data set of the current ZSL task and the ImageNet data set is too large, which may lead to the use of pre-trained models has no obvious help or even negative impact on the performance of the ZSL model. To solve this problem, this paper proposes a biologically inspired feature enhancement framework for ZSL. Specifically, we design a dual-channel learning framework that uses auxiliary data sets to enhance the feature extractor of the ZSL model and propose a novel method to guide the selection of the auxiliary data sets based on the knowledge of biological taxonomy. Extensive experimental results show that our proposed method can effectively improve the generalization ability of the ZSL model and achieve state-of-the-art results on three benchmark ZSL tasks. We also explained the experimental phenomena through the way of feature visualization. △ Less

Submitted 13 May, 2020; originally announced May 2020.

arXiv:2003.01762 [pdf, other]

FLAME: A Self-Adaptive Auto-labeling System for Heterogeneous Mobile Processors

Authors: Jie Liu, Jiawen Liu, Zhen Xie, Dong Li

Abstract: How to accurately and efficiently label data on a mobile device is critical for the success of training machine learning models on mobile devices. Auto-labeling data on mobile devices is challenging, because data is usually incrementally generated and there is possibility of having unknown labels. Furthermore, the rich hardware heterogeneity on mobile devices creates challenges on efficiently exec… ▽ More How to accurately and efficiently label data on a mobile device is critical for the success of training machine learning models on mobile devices. Auto-labeling data on mobile devices is challenging, because data is usually incrementally generated and there is possibility of having unknown labels. Furthermore, the rich hardware heterogeneity on mobile devices creates challenges on efficiently executing auto-labeling workloads. In this paper, we introduce Flame, an auto-labeling system that can label non-stationary data with unknown labels. Flame includes a runtime system that efficiently schedules and executes auto-labeling workloads on heterogeneous mobile processors. Evaluating Flame with eight datasets on a smartphone, we demonstrate that Flame enables auto-labeling with high labeling accuracy and high performance. △ Less

Submitted 3 March, 2020; originally announced March 2020.

arXiv:2002.03495 [pdf, other]

A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima

Authors: Zeke Xie, Issei Sato, Masashi Sugiyama

Abstract: Stochastic Gradient Descent (SGD) and its variants are mainstream methods for training deep networks in practice. SGD is known to find a flat minimum that often generalizes well. However, it is mathematically unclear how deep learning can select a flat minimum among so many minima. To answer the question quantitatively, we develop a density diffusion theory (DDT) to reveal how minima selection qua… ▽ More Stochastic Gradient Descent (SGD) and its variants are mainstream methods for training deep networks in practice. SGD is known to find a flat minimum that often generalizes well. However, it is mathematically unclear how deep learning can select a flat minimum among so many minima. To answer the question quantitatively, we develop a density diffusion theory (DDT) to reveal how minima selection quantitatively depends on the minima sharpness and the hyperparameters. To the best of our knowledge, we are the first to theoretically and empirically prove that, benefited from the Hessian-dependent covariance of stochastic gradient noise, SGD favors flat minima exponentially more than sharp minima, while Gradient Descent (GD) with injected white noise favors flat minima only polynomially more than sharp minima. We also reveal that either a small learning rate or large-batch training requires exponentially many iterations to escape from minima in terms of the ratio of the batch size and learning rate. Thus, large-batch training cannot search flat minima efficiently in a realistic computational time. △ Less

Submitted 15 January, 2021; v1 submitted 9 February, 2020; originally announced February 2020.

Comments: ICLR 2021; 28 pages; 19 figures

arXiv:1912.05903

Prediction and optimization of NaV1.7 inhibitors based on machine learning methods

Authors: Weikaixin Kong, Xinyu Tu, Zhengwei Xie, Zhuo Huang

Abstract: We used machine learning methods to predict NaV1.7 inhibitors and found the model RF-CDK that performed best on the imbalanced dataset. Using the RF-CDK model for screening drugs, we got effective compounds K1. We use the cell patch clamp method to verify K1. However, because the model evaluation method in this article is not comprehensive enough, there is still a lot of research work to be perfor… ▽ More We used machine learning methods to predict NaV1.7 inhibitors and found the model RF-CDK that performed best on the imbalanced dataset. Using the RF-CDK model for screening drugs, we got effective compounds K1. We use the cell patch clamp method to verify K1. However, because the model evaluation method in this article is not comprehensive enough, there is still a lot of research work to be performed, such as comparison with other existing methods. The target protein has multiple active sites and requires our further research. We need more detailed models to consider this biological process and compare it with the current results, which is an error in this article. So we want to withdraw this article. △ Less

Submitted 15 February, 2020; v1 submitted 29 November, 2019; originally announced December 2019.

Comments: The evaluation of the model in the results section of this article is not comprehensive enough.We will carry out further work. The article needs to be polished. There are certain disadvantages to the molecular optimization method. The discussion part is not deep enough, so withdraw is needed

arXiv:1912.03015 [pdf, other]

Learning to Correspond Dynamical Systems

Authors: Nam Hee Kim, Zhaoming Xie, Michiel van de Panne

Abstract: Many dynamical systems exhibit similar structure, as often captured by hand-designed simplified models that can be used for analysis and control. We develop a method for learning to correspond pairs of dynamical systems via a learned latent dynamical system. Given trajectory data from two dynamical systems, we learn a shared latent state space and a shared latent dynamics model, along with an enco… ▽ More Many dynamical systems exhibit similar structure, as often captured by hand-designed simplified models that can be used for analysis and control. We develop a method for learning to correspond pairs of dynamical systems via a learned latent dynamical system. Given trajectory data from two dynamical systems, we learn a shared latent state space and a shared latent dynamics model, along with an encoder-decoder pair for each of the original systems. With the learned correspondences in place, we can use a simulation of one system to produce an imagined motion of its counterpart. We can also simulate in the learned latent dynamics and synthesize the motions of both corresponding systems, as a form of bisimulation. We demonstrate the approach using pairs of controlled bipedal walkers, as well as by pairing a walker with a controlled pendulum. △ Less

Submitted 4 June, 2020; v1 submitted 6 December, 2019; originally announced December 2019.

arXiv:1812.00335 [pdf, other]

GAN-EM: GAN based EM learning framework

Authors: Wentian Zhao, Shaojie Wang, Zhihuai Xie, **g Shi, Chenliang Xu

Abstract: Expectation maximization (EM) algorithm is to find maximum likelihood solution for models having latent variables. A typical example is Gaussian Mixture Model (GMM) which requires Gaussian assumption, however, natural images are highly non-Gaussian so that GMM cannot be applied to perform clustering task on pixel space. To overcome such limitation, we propose a GAN based EM learning framework that… ▽ More Expectation maximization (EM) algorithm is to find maximum likelihood solution for models having latent variables. A typical example is Gaussian Mixture Model (GMM) which requires Gaussian assumption, however, natural images are highly non-Gaussian so that GMM cannot be applied to perform clustering task on pixel space. To overcome such limitation, we propose a GAN based EM learning framework that can maximize the likelihood of images and estimate the latent variables with only the constraint of L-Lipschitz continuity. We call this model GAN-EM, which is a framework for image clustering, semi-supervised classification and dimensionality reduction. In M-step, we design a novel loss function for discriminator of GAN to perform maximum likelihood estimation (MLE) on data with soft class label assignments. Specifically, a conditional generator captures data distribution for $K$ classes, and a discriminator tells whether a sample is real or fake for each class. Since our model is unsupervised, the class label of real data is regarded as latent variable, which is estimated by an additional network (E-net) in E-step. The proposed GAN-EM achieves state-of-the-art clustering and semi-supervised classification results on MNIST, SVHN and CelebA, as well as comparable quality of generated images to other recently developed generative models. △ Less

Submitted 2 December, 2018; originally announced December 2018.

arXiv:1809.00083 [pdf, other]

Predicting protein inter-residue contacts using composite likelihood maximization and deep learning

Authors: Haicang Zhang, Qi Zhang, Fusong Ju, Jianwei Zhu, Shiwei Sun, Yujuan Gao, Ziwei Xie, Minghua Deng, Shiwei Sun, Wei-Mou Zheng, Dongbo Bu

Abstract: Accurate prediction of inter-residue contacts of a protein is important to calcu- lating its tertiary structure. Analysis of co-evolutionary events among residues has been proved effective to inferring inter-residue contacts. The Markov ran- dom field (MRF) technique, although being widely used for contact prediction, suffers from the following dilemma: the actual likelihood function of MRF is acc… ▽ More Accurate prediction of inter-residue contacts of a protein is important to calcu- lating its tertiary structure. Analysis of co-evolutionary events among residues has been proved effective to inferring inter-residue contacts. The Markov ran- dom field (MRF) technique, although being widely used for contact prediction, suffers from the following dilemma: the actual likelihood function of MRF is accurate but time-consuming to calculate, in contrast, approximations to the actual likelihood, say pseudo-likelihood, are efficient to calculate but inaccu- rate. Thus, how to achieve both accuracy and efficiency simultaneously remains a challenge. In this study, we present such an approach (called clmDCA) for contact prediction. Unlike plmDCA using pseudo-likelihood, i.e., the product of conditional probability of individual residues, our approach uses composite- likelihood, i.e., the product of conditional probability of all residue pairs. Com- posite likelihood has been theoretically proved as a better approximation to the actual likelihood function than pseudo-likelihood. Meanwhile, composite likelihood is still efficient to maximize, thus ensuring the efficiency of clmDCA. We present comprehensive experiments on popular benchmark datasets, includ- ing PSICOV dataset and CASP-11 dataset, to show that: i) clmDCA alone outperforms the existing MRF-based approaches in prediction accuracy. ii) When equipped with deep learning technique for refinement, the prediction ac- curacy of clmDCA was further significantly improved, suggesting the suitability of clmDCA for subsequent refinement procedure. We further present successful application of the predicted contacts to accurately build tertiary structures for proteins in the PSICOV dataset. Accessibility: The software clmDCA and a server are publicly accessible through http://protein.ict.ac.cn/clmDCA/. △ Less

Submitted 31 August, 2018; originally announced September 2018.

arXiv:1807.11790 [pdf, other]

Practical Constrained Optimization of Auction Mechanisms in E-Commerce Sponsored Search Advertising

Authors: Gang Bai, Zhihui Xie, Liang Wang

Abstract: Sponsored search in E-commerce platforms such as Amazon, Taobao and Tmall provides sellers an effective way to reach potential buyers with most relevant purpose. In this paper, we study the auction mechanism optimization problem in sponsored search on Alibaba's mobile E-commerce platform. Besides generating revenue, we are supposed to maintain an efficient marketplace with plenty of quality users,… ▽ More Sponsored search in E-commerce platforms such as Amazon, Taobao and Tmall provides sellers an effective way to reach potential buyers with most relevant purpose. In this paper, we study the auction mechanism optimization problem in sponsored search on Alibaba's mobile E-commerce platform. Besides generating revenue, we are supposed to maintain an efficient marketplace with plenty of quality users, guarantee a reasonable return on investment (ROI) for advertisers, and meanwhile, facilitate a pleasant shop** experience for the users. These requirements essentially pose a constrained optimization problem. Directly optimizing over auction parameters yields a discontinuous, non-convex problem that denies effective solutions. One of our major contribution is a practical convex optimization formulation of the original problem. We devise a novel re-parametrization of auction mechanism with discrete sets of representative instances. To construct the optimization problem, we build an auction simulation system which estimates the resulted business indicators of the selected parameters by replaying the auctions recorded from real online requests. We summarized the experiments on real search traffics to analyze the effects of fidelity of auction simulation, the efficacy under various constraint targets and the influence of regularization. The experiment results show that with proper entropy regularization, we are able to maximize revenue while constraining other business indicators within given ranges. △ Less

Submitted 31 July, 2018; originally announced July 2018.

Comments: 6 pages, 1 figure

arXiv:1803.08010 [pdf, other]

doi 10.1140/epjds/s13688-018-0163-7

Social Media Would Not Lie: Prediction of the 2016 Taiwan Election via Online Heterogeneous Data

Authors: Zheng Xie, Guannan Liu, Junjie Wu, Yong Tan

Abstract: The prevalence of online media has attracted researchers from various domains to explore human behavior and make interesting predictions. In this research, we leverage heterogeneous social media data collected from various online platforms to predict Taiwan's 2016 presidential election. In contrast to most existing research, we take a "signal" view of heterogeneous information and adopt the Kalman… ▽ More The prevalence of online media has attracted researchers from various domains to explore human behavior and make interesting predictions. In this research, we leverage heterogeneous social media data collected from various online platforms to predict Taiwan's 2016 presidential election. In contrast to most existing research, we take a "signal" view of heterogeneous information and adopt the Kalman filter to fuse multiple signals into daily vote predictions for the candidates. We also consider events that influenced the election in a quantitative manner based on the so-called event study model that originated in the field of financial research. We obtained the following interesting findings. First, public opinions in online media dominate traditional polls in Taiwan election prediction in terms of both predictive power and timeliness. But offline polls can still function on alleviating the sample bias of online opinions. Second, although online signals converge as election day approaches, the simple Facebook "Like" is consistently the strongest indicator of the election result. Third, most influential events have a strong connection to cross-strait relations, and the Chou Tzu-yu flag incident followed by the apology video one day before the election increased the vote share of Tsai Ing-Wen by 3.66%. This research justifies the predictive power of online media in politics and the advantages of information fusion. The combined use of the Kalman filter and the event study method contributes to the data-driven political analytics paradigm for both prediction and attribution purposes. △ Less

Submitted 3 April, 2018; v1 submitted 21 March, 2018; originally announced March 2018.

Journal ref: EPJ Data Science,2018,7:32

arXiv:1711.09534 [pdf, other]

Neural Text Generation: A Practical Guide

Authors: Ziang Xie

Abstract: Deep learning methods have recently achieved great empirical success on machine translation, dialogue response generation, summarization, and other text generation tasks. At a high level, the technique has been to train end-to-end neural network models consisting of an encoder model to produce a hidden representation of the source text, followed by a decoder model to generate the target. While suc… ▽ More Deep learning methods have recently achieved great empirical success on machine translation, dialogue response generation, summarization, and other text generation tasks. At a high level, the technique has been to train end-to-end neural network models consisting of an encoder model to produce a hidden representation of the source text, followed by a decoder model to generate the target. While such models have significantly fewer pieces than earlier systems, significant tuning is still required to achieve good performance. For text generation models in particular, the decoder can behave in undesired ways, such as by generating truncated or repetitive outputs, outputting bland and generic responses, or in some cases producing ungrammatical gibberish. This paper is intended as a practical guide for resolving such undesired behavior in text generation models, with the aim of hel** enable real-world applications. △ Less

Submitted 26 November, 2017; originally announced November 2017.

arXiv:1406.7806 [pdf, other]

Building DNN Acoustic Models for Large Vocabulary Speech Recognition

Authors: Andrew L. Maas, Peng Qi, Ziang Xie, Awni Y. Hannun, Christopher T. Lengerich, Daniel Jurafsky, Andrew Y. Ng

Abstract: Deep neural networks (DNNs) are now a central component of nearly all state-of-the-art speech recognition systems. Building neural network acoustic models requires several design decisions including network architecture, size, and training loss function. This paper offers an empirical investigation on which aspects of DNN acoustic model design are most important for speech recognition system perfo… ▽ More Deep neural networks (DNNs) are now a central component of nearly all state-of-the-art speech recognition systems. Building neural network acoustic models requires several design decisions including network architecture, size, and training loss function. This paper offers an empirical investigation on which aspects of DNN acoustic model design are most important for speech recognition system performance. We report DNN classifier performance and final speech recognizer word error rates, and compare DNNs using several metrics to quantify factors influencing differences in task performance. Our first set of experiments use the standard Switchboard benchmark corpus, which contains approximately 300 hours of conversational telephone speech. We compare standard DNNs to convolutional networks, and present the first experiments using locally-connected, untied neural networks for acoustic modeling. We additionally build systems on a corpus of 2,100 hours of training data by combining the Switchboard and Fisher corpora. This larger corpus allows us to more thoroughly examine performance of large DNN models -- with up to ten times more parameters than those typically used in speech recognition systems. Our results suggest that a relatively simple DNN architecture and optimization technique produces strong results. These findings, along with previous work, help establish a set of best practices for building DNN hybrid speech recognition systems with maximum likelihood training. Our experiments in DNN optimization additionally serve as a case study for training DNNs with discriminative loss functions for speech tasks, as well as DNN classifiers more generally. △ Less

Submitted 20 January, 2015; v1 submitted 30 June, 2014; originally announced June 2014.

Showing 1–24 of 24 results for author: Xie, Z