Skip to main content

Showing 1–22 of 22 results for author: Wong, W H

Searching in archive stat. Search in all archives.
.
  1. arXiv:2402.10456  [pdf, other

    stat.ML cs.LG stat.AP stat.ME

    Generative Modeling for Tabular Data via Penalized Optimal Transport Network

    Authors: Wenhui Sophia Lu, Chenyang Zhong, Wing Hung Wong

    Abstract: The task of precisely learning the probability distribution of rows within tabular data and producing authentic synthetic samples is both crucial and non-trivial. Wasserstein generative adversarial network (WGAN) marks a notable improvement in generative modeling, addressing the challenges faced by its predecessor, generative adversarial network. However, due to the mixed data types and multimodal… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: 37 pages, 23 figures

  2. arXiv:2212.05925  [pdf, other

    stat.ML cs.LG

    CausalEGM: a general causal inference framework by encoding generative modeling

    Authors: Qiao Liu, Zhongren Chen, Wing Hung Wong

    Abstract: Although understanding and characterizing causal effects have become essential in observational studies, it is challenging when the confounders are high-dimensional. In this article, we develop a general framework $\textit{CausalEGM}$ for estimating causal effects by encoding generative modeling, which can be applied in both binary and continuous treatment settings. Under the potential outcome fra… ▽ More

    Submitted 16 March, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

  3. arXiv:2104.10633  [pdf

    math.ST stat.ME

    A calculus for causal inference with instrumental variables

    Authors: Wing Hung Wong

    Abstract: Under a general structural equation framework for causal inference, we provide a definition of the causal effect of a variable X on another variable Y, and propose an approach to estimate this causal effect via the use of instrumental variables.

    Submitted 23 April, 2021; v1 submitted 21 April, 2021; originally announced April 2021.

    Comments: 10 pages

  4. arXiv:2004.09017  [pdf, other

    cs.LG stat.ME stat.ML

    Roundtrip: A Deep Generative Neural Density Estimator

    Authors: Qiao Liu, Jiaze Xu, Rui Jiang, Wing Hung Wong

    Abstract: Density estimation is a fundamental problem in both statistics and machine learning. In this study, we proposed Roundtrip as a general-purpose neural density estimator based on deep generative models. Roundtrip retains the generative power of generative adversarial networks (GANs) but also provides estimates of density values. Unlike previous neural density estimators that put stringent conditions… ▽ More

    Submitted 4 September, 2020; v1 submitted 19 April, 2020; originally announced April 2020.

    Journal ref: Proceedings of the National Academy of Sciences, 2021, 118(15)

  5. arXiv:1908.02910  [pdf, other

    stat.ML cs.LG

    Mini-batch Metropolis-Hastings MCMC with Reversible SGLD Proposal

    Authors: Tung-Yu Wu, Y. X. Rachel Wang, Wing H. Wong

    Abstract: Traditional MCMC algorithms are computationally intensive and do not scale well to large data. In particular, the Metropolis-Hastings (MH) algorithm requires passing over the entire dataset to evaluate the likelihood ratio in each iteration. We propose a general framework for performing MH-MCMC using mini-batches of the whole dataset and show that this gives rise to approximately a tempered statio… ▽ More

    Submitted 28 August, 2019; v1 submitted 7 August, 2019; originally announced August 2019.

  6. arXiv:1807.06776  [pdf, other

    stat.ME

    Detecting strong signals in gene perturbation experiments: An adaptive approach with power guarantee and FDR control

    Authors: Leying Guan, Xi Chen, Wing Hung Wong

    Abstract: The perturbation of a transcription factor should affect the expression levels of its direct targets. However, not all genes showing changes in expression are direct targets. To increase the chance of detecting direct targets, we propose a modified two-group model where the null group corresponds to genes which are not direct targets, but can have small non-zero effects. We model the behaviour of… ▽ More

    Submitted 18 July, 2018; originally announced July 2018.

  7. arXiv:1707.09705  [pdf, other

    stat.CO

    Mini-batch Tempered MCMC

    Authors: Dangna Li, Wing H Wong

    Abstract: In this paper we propose a general framework of performing MCMC with only a mini-batch of data. We show by estimating the Metropolis-Hasting ratio with only a mini-batch of data, one is essentially sampling from the true posterior raised to a known temperature. We show by experiments that our method, Mini-batch Tempered MCMC (MINT-MCMC), can efficiently explore multiple modes of a posterior distri… ▽ More

    Submitted 21 May, 2018; v1 submitted 30 July, 2017; originally announced July 2017.

  8. arXiv:1610.07213  [pdf, other

    stat.ME q-bio.MN q-bio.QM

    Stochastic Modeling and Statistical Inference of Intrinsic Noise in Gene Regulation System via Chemical Master Equation

    Authors: Chao Du, Wing Hong Wong

    Abstract: Intrinsic noise, the stochastic cell-to-cell fluctuations in mRNAs and proteins, has been observed and proved to play important roles in cellular systems. Due to the recent development in single-cell-level measurement technology, the studies on intrinsic noise are becoming increasingly popular among scholars. The chemical master equation (CME) has been used to model the evolutions of complex chemi… ▽ More

    Submitted 11 November, 2017; v1 submitted 23 October, 2016; originally announced October 2016.

    Comments: 64 pages, 5 figures

  9. arXiv:1605.06220  [pdf, other

    stat.ML cs.LG

    Convergence of Contrastive Divergence with Annealed Learning Rate in Exponential Family

    Authors: Bai Jiang, Tung-yu Wu, Wing H. Wong

    Abstract: In our recent paper, we showed that in exponential family, contrastive divergence (CD) with fixed learning rate will give asymptotically consistent estimates \cite{wu2016convergence}. In this paper, we establish consistency and convergence rate of CD with annealed learning rate $η_t$. Specifically, suppose CD-$m$ generates the sequence of parameters $\{θ_t\}_{t \ge 0}$ using an i.i.d. data sample… ▽ More

    Submitted 20 May, 2016; originally announced May 2016.

  10. arXiv:1603.05729  [pdf, other

    stat.ML

    Convergence of Contrastive Divergence Algorithm in Exponential Family

    Authors: Bai Jiang, Tung-Yu Wu, Yifan **, Wing H. Wong

    Abstract: The Contrastive Divergence (CD) algorithm has achieved notable success in training energy-based models including Restricted Boltzmann Machines and played a key role in the emergence of deep learning. The idea of this algorithm is to approximate the intractable term in the exact gradient of the log-likelihood function by using short Markov chain Monte Carlo (MCMC) runs. The approximate gradient is… ▽ More

    Submitted 27 February, 2018; v1 submitted 17 March, 2016; originally announced March 2016.

    MSC Class: 68W48; 60J20; 93E15

  11. arXiv:1510.02175  [pdf, other

    stat.ME stat.CO stat.ML

    Learning Summary Statistic for Approximate Bayesian Computation via Deep Neural Network

    Authors: Bai Jiang, Tung-yu Wu, Charles Zheng, Wing H. Wong

    Abstract: Approximate Bayesian Computation (ABC) methods are used to approximate posterior distributions in models with unknown or computationally intractable likelihoods. Both the accuracy and computational efficiency of ABC depend on the choice of summary statistic, but outside of special cases where the optimal summary statistics are known, it is unclear which guiding principles can be used to construct… ▽ More

    Submitted 16 March, 2017; v1 submitted 7 October, 2015; originally announced October 2015.

    Comments: 27 pages, 10 figures

  12. arXiv:1410.0726  [pdf, other

    stat.CO

    co-BPM: a Bayesian Model for Divergence Estimation

    Authors: Kun Yang, Hao Su, Wing Hung Wong

    Abstract: Divergence is not only an important mathematical concept in information theory, but also applied to machine learning problems such as low-dimensional embedding, manifold learning, clustering, classification, and anomaly detection. We proposed a bayesian model---co-BPM---to characterize the discrepancy of two sample sets, i.e., to estimate the divergence of their underlying distributions. In order… ▽ More

    Submitted 20 November, 2016; v1 submitted 2 October, 2014; originally announced October 2014.

    Comments: Key Words: coupled binary partition, divergence, MCMC, clustering, classification

  13. arXiv:1404.1425  [pdf, other

    stat.ML

    Density Estimation via Discrepancy Based Adaptive Sequential Partition

    Authors: Dangna Li, Kun Yang, Wing Hung Wong

    Abstract: Given $iid$ observations from an unknown absolute continuous distribution defined on some domain $Ω$, we propose a nonparametric method to learn a piecewise constant function to approximate the underlying probability density function. Our density estimate is a piecewise constant function defined on a binary partition of $Ω$. The key ingredient of the algorithm is to use discrepancy, a concept orig… ▽ More

    Submitted 11 March, 2018; v1 submitted 4 April, 2014; originally announced April 2014.

    Comments: Binary Partition, Star Discrepancy, Density Estimation, Mode Seeking, Level Set Tree

  14. arXiv:1403.4370  [pdf, other

    stat.AP

    Discovering and Visualizing Hierarchy in Multivariate Data

    Authors: Kun Yang, Wing Hung Wong

    Abstract: How to extract useful insights from data is always a challenge, especially if the data is multidimensional. Often, the data can be organized according to certain hierarchical structure that are stemmed either from data collection process or from the information and phenomena carried by the data itself. The current study attempts to discover and visualize these underlying hierarchies. By regarding… ▽ More

    Submitted 20 April, 2016; v1 submitted 18 March, 2014; originally announced March 2014.

  15. arXiv:1401.2597  [pdf, other

    math.ST stat.ME

    Multivariate Density Estimation via Adaptive Partitioning (I): Sieve MLE

    Authors: Linxi Liu, Wing Hung Wong

    Abstract: We study a non-parametric approach to multivariate density estimation. The estimators are piecewise constant density functions supported by binary partitions. The partition of the sample space is learned by maximizing the likelihood of the corresponding histogram on that partition. We analyze the convergence rate of the sieve maximum likelihood estimator, and reach a conclusion that for a relative… ▽ More

    Submitted 19 August, 2015; v1 submitted 12 January, 2014; originally announced January 2014.

  16. arXiv:1309.5489  [pdf, other

    stat.CO

    Computational Aspects of Optional Pólya Tree

    Authors: Hui Jiang, John C. Mu, Kun Yang, Chao Du, Luo Lu, Wing Hung Wong

    Abstract: Optional Pólya Tree (OPT) is a flexible non-parametric Bayesian model for density estimation. Despite its merits, the computation for OPT inference is challenging. In this paper we present time complexity analysis for OPT inference and propose two algorithmic improvements. The first improvement, named Limited-Lookahead Optional Pólya Tree (LL-OPT), aims at greatly accelerate the computation for OP… ▽ More

    Submitted 21 September, 2013; originally announced September 2013.

  17. arXiv:1207.3137  [pdf, ps, other

    q-bio.MN stat.AP

    Learning a nonlinear dynamical system model of gene regulation: A perturbed steady-state approach

    Authors: Arwen Vanice Bradley, Ye Henry Li, Bokyung Choi, Wing Hung Wong

    Abstract: Biological structure and function depend on complex regulatory interactions between many genes. A wealth of gene expression data is available from high-throughput genome-wide measurement technologies, but effective gene regulatory network inference methods are still needed. Model-based methods founded on quantitative descriptions of gene regulation are among the most promising, but many such metho… ▽ More

    Submitted 25 March, 2016; v1 submitted 12 July, 2012; originally announced July 2012.

    Comments: Published in at http://dx.doi.org/10.1214/13-AOAS645 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS645

    Journal ref: Annals of Applied Statistics 2013, Vol. 7, No. 3, 1311-1333

  18. arXiv:1106.3211  [pdf, ps, other

    stat.ME q-bio.GN

    Statistical Modeling of RNA-Seq Data

    Authors: Julia Salzman, Hui Jiang, Wing Hung Wong

    Abstract: Recently, ultra high-throughput sequencing of RNA (RNA-Seq) has been developed as an approach for analysis of gene expression. By obtaining tens or even hundreds of millions of reads of transcribed sequences, an RNA-Seq experiment can offer a comprehensive survey of the population of genes (transcripts) in any sample of interest. This paper introduces a statistical model for estimating isoform abu… ▽ More

    Submitted 16 June, 2011; originally announced June 2011.

    Comments: Published in at http://dx.doi.org/10.1214/10-STS343 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-STS-STS343

    Journal ref: Statistical Science 2011, Vol. 26, No. 1, 62-83

  19. From EM to Data Augmentation: The Emergence of MCMC Bayesian Computation in the 1980s

    Authors: Martin A. Tanner, Wing H. Wong

    Abstract: It was known from Metropolis et al. [J. Chem. Phys. 21 (1953) 1087--1092] that one can sample from a distribution by performing Monte Carlo simulation from a Markov chain whose equilibrium distribution is equal to the target distribution. However, it took several decades before the statistical community embraced Markov chain Monte Carlo (MCMC) as a general computational tool in Bayesian inference.… ▽ More

    Submitted 12 April, 2011; originally announced April 2011.

    Comments: Published in at http://dx.doi.org/10.1214/10-STS341 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-STS-STS341

    Journal ref: Statistical Science 2010, Vol. 25, No. 4, 506-516

  20. arXiv:1011.1253  [pdf, ps, other

    stat.ME math.ST

    Coupling optional Pólya trees and the two sample problem

    Authors: Li Ma, Wing H. Wong

    Abstract: Testing and characterizing the difference between two data samples is of fundamental interest in statistics. Existing methods such as Kolmogorov-Smirnov and Cramer-von-Mises tests do not scale well as the dimensionality increases and provides no easy way to characterize the difference should it exist. In this work, we propose a theoretical framework for inference that addresses these challenges in… ▽ More

    Submitted 22 March, 2011; v1 submitted 4 November, 2010; originally announced November 2010.

    Comments: 44 pages, 6 figures

    MSC Class: 62F15; 62G99

  21. Reconstructing the energy landscape of a distribution from Monte Carlo samples

    Authors: Qing Zhou, Wing Hung Wong

    Abstract: Defining the energy function as the negative logarithm of the density, we explore the energy landscape of a distribution via the tree of sublevel sets of its energy. This tree represents the hierarchy among the connected components of the sublevel sets. We propose ways to annotate the tree so that it provides information on both topological and statistical aspects of the distribution, such as th… ▽ More

    Submitted 26 January, 2009; originally announced January 2009.

    Comments: Published in at http://dx.doi.org/10.1214/08-AOAS196 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS196

    Journal ref: Annals of Applied Statistics 2008, Vol. 2, No. 4, 1307-1331

  22. Coupling hidden Markov models for the discovery of Cis-regulatory modules in multiple species

    Authors: Qing Zhou, Wing Hung Wong

    Abstract: Cis-regulatory modules (CRMs) composed of multiple transcription factor binding sites (TFBSs) control gene expression in eukaryotic genomes. Comparative genomic studies have shown that these regulatory elements are more conserved across species due to evolutionary constraints. We propose a statistical method to combine module structure and cross-species orthology in de novo motif discovery. We u… ▽ More

    Submitted 31 August, 2007; originally announced August 2007.

    Comments: Published at http://dx.doi.org/10.1214/07-AOAS103 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS103

    Journal ref: Annals of Applied Statistics 2007, Vol. 1, No. 1, 36-65