Skip to main content

Showing 1–29 of 29 results for author: Gu, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2311.05167  [pdf, other

    physics.chem-ph cond-mat.soft cs.LG physics.comp-ph stat.AP

    Perfecting Liquid-State Theories with Machine Intelligence

    Authors: Jianzhong Wu, Mengyang Gu

    Abstract: Recent years have seen a significant increase in the use of machine intelligence for predicting electronic structure, molecular force fields, and the physicochemical properties of various condensed systems. However, substantial challenges remain in develo** a comprehensive framework capable of handling a wide range of atomic compositions and thermodynamic conditions. This perspective discusses p… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  2. arXiv:2310.18611  [pdf, other

    stat.AP stat.ME

    Sequential Kalman filter for fast online changepoint detection in longitudinal health records

    Authors: Hanmo Li, Yuedong Wang, Mengyang Gu

    Abstract: This article introduces the sequential Kalman filter, a computationally scalable approach for online changepoint detection with temporally correlated data. The temporal correlation was not considered in the Bayesian online changepoint detection approach due to the large computational cost. Motivated by detecting COVID-19 infections for dialysis patients from massive longitudinal health records wit… ▽ More

    Submitted 1 January, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

  3. arXiv:2310.16136  [pdf, other

    stat.AP cs.NI

    Analyzing Disparity and Temporal Progression of Internet Quality through Crowdsourced Measurements with Bias-Correction

    Authors: Hyeongseong Lee, Udit Paul, Arpit Gupta, Elizabeth Belding, Mengyang Gu

    Abstract: Crowdsourced speedtest measurements are an important tool for studying internet performance from the end user perspective. Nevertheless, despite the accuracy of individual measurements, simplistic aggregation of these data points is problematic due to their intrinsic sampling bias. In this work, we utilize a dataset of nearly 1 million individual Ookla Speedtest measurements, correlate each datapo… ▽ More

    Submitted 7 December, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

  4. arXiv:2309.02468  [pdf, other

    physics.comp-ph cond-mat.soft physics.data-an stat.AP stat.ME

    Ab initio uncertainty quantification in scattering analysis of microscopy

    Authors: Mengyang Gu, Yue He, Xubo Liu, Yimin Luo

    Abstract: Estimating parameters from data is a fundamental problem in physics, customarily done by minimizing a loss function between a model and observed statistics. In scattering-based analysis, researchers often employ their domain expertise to select a specific range of wavevectors for analysis, a choice that can vary depending on the specific case. We introduce another paradigm that defines a probabili… ▽ More

    Submitted 19 February, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: 23 pages, 9 figures

  5. arXiv:2305.08942  [pdf, other

    stat.ME physics.data-an stat.AP

    Probabilistic forecast of nonlinear dynamical systems with uncertainty quantification

    Authors: Mengyang Gu, Yizi Lin, Victor Chang Lee, Diana Qiu

    Abstract: Data-driven modeling is useful for reconstructing nonlinear dynamical systems when the underlying process is unknown or too expensive to compute. Having reliable uncertainty assessment of the forecast enables tools to be deployed to predict new scenarios unobserved before. In this work, we first extend parallel partial Gaussian processes for predicting the vector-valued transition function that li… ▽ More

    Submitted 30 October, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

    Journal ref: Physica D: Nonlinear Phenomena, 133938 (2023)

  6. arXiv:2305.04140  [pdf, other

    stat.ME

    A Nonparametric Mixed-Effects Mixture Model for Patterns of Clinical Measurements Associated with COVID-19

    Authors: Xiaoran Ma, Wensheng Guo, Mengyang Gu, Len Usvyat, Peter Kotanko, Yuedong Wang

    Abstract: Some patients with COVID-19 show changes in signs and symptoms such as temperature and oxygen saturation days before being positively tested for SARS-CoV-2, while others remain asymptomatic. It is important to identify these subgroups and to understand what biological and clinical predictors are related to these subgroups. This information will provide insights into how the immune system may respo… ▽ More

    Submitted 31 May, 2024; v1 submitted 6 May, 2023; originally announced May 2023.

  7. arXiv:2303.03568  [pdf, other

    physics.bio-ph physics.data-an stat.AP

    Data-driven model construction for anisotropic dynamics of active matter

    Authors: Mengyang Gu, Xinyi Fang, Yimin Luo

    Abstract: The dynamics of cellular pattern formation is crucial for understanding embryonic development and tissue morphogenesis. Recent studies have shown that human dermal fibroblasts cultured on liquid crystal elastomers can exhibit an increase in orientational alignment over time, accompanied by cell proliferation, under the influence of the weak guidance of a molecularly aligned substrate. However, a c… ▽ More

    Submitted 23 August, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: 20 pages, 14 figures

    Journal ref: PRX Life, 1, 013009 (2023)

  8. arXiv:2208.06727  [pdf, other

    physics.chem-ph stat.AP

    Reliable emulation of complex functionals by active learning with error control

    Authors: Xinyi Fang, Mengyang Gu, Jianzhong Wu

    Abstract: A statistical emulator can be used as a surrogate of complex physics-based calculations to drastically reduce the computational cost. Its successful implementation hinges on an accurate representation of the nonlinear response surface with a high-dimensional input space. Conventional "space-filling" designs, including random sampling and Latin hypercube sampling, become inefficient as the dimensio… ▽ More

    Submitted 30 January, 2024; v1 submitted 13 August, 2022; originally announced August 2022.

    Comments: 15 pages, 10 figures

  9. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  10. arXiv:2203.08389  [pdf, other

    stat.CO

    Scalable marginalization of correlated latent variables with applications to learning particle interaction kernels

    Authors: Mengyang Gu, Xubo Liu, Xinyi Fang, Sui Tang

    Abstract: Marginalization of latent variables or nuisance parameters is a fundamental aspect of Bayesian inference and uncertainty quantification. In this work, we focus on scalable marginalization of latent variables in modeling correlated data, such as spatio-temporal or functional observations. We first introduce Gaussian processes (GPs) for modeling correlated data and highlight the computational challe… ▽ More

    Submitted 9 October, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

  11. arXiv:2201.01476  [pdf, other

    stat.CO stat.AP

    RobustCalibration: Robust Calibration of Computer Models in R

    Authors: Mengyang Gu

    Abstract: Two fundamental research tasks in science and engineering are forward predictions and data inversion. This article introduces a recent R package RobustCalibration for Bayesian data inversion and model calibration by experiments and field observations. Mathematical models for forward predictions are often written in computer code, and they can be computationally expensive slow to run. To overcome t… ▽ More

    Submitted 18 February, 2024; v1 submitted 5 January, 2022; originally announced January 2022.

  12. arXiv:2108.06072  [pdf, other

    physics.chem-ph stat.AP

    Efficient force field and energy emulation through partition of permutationally equivalent atoms

    Authors: Hao Li, Musen Zhou, Jessalyn Sebastian, Jianzhong Wu, Mengyang Gu

    Abstract: Gaussian process (GP) emulator has been used as a surrogate model for predicting force field and molecular potential, to overcome the computational bottleneck of molecular dynamics simulation. Integrating both atomic force and energy in predictions was found to be more accurate than using energy alone, yet it requires $O((NM)^3)$ computational operations for computing the likelihood function and m… ▽ More

    Submitted 9 May, 2022; v1 submitted 13 August, 2021; originally announced August 2021.

  13. arXiv:2105.01200  [pdf, other

    cond-mat.soft physics.data-an stat.AP

    Uncertainty quantification and estimation in differential dynamic microscopy

    Authors: Mengyang Gu, Yimin Luo, Yue He, Matthew E. Helgeson, Megan T. Valentine

    Abstract: Differential dynamic microscopy (DDM) is a form of video image analysis that combines the sensitivity of scattering and the direct visualization benefits of microscopy. DDM is broadly useful in determining dynamical properties including the intermediate scattering function for many spatiotemporally correlated systems. Despite its straightforward analysis, DDM has not been fully adopted as a routin… ▽ More

    Submitted 11 April, 2022; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: Published in Physical Review E. 24 pages, 12 figures. Typos in Section 2B are corrected

    Journal ref: Phys. Rev. E 104, 034610 (2021)

  14. arXiv:2011.10863  [pdf, other

    stat.ME stat.AP stat.CO

    Gaussian orthogonal latent factor processes for large incomplete matrices of correlated data

    Authors: Mengyang Gu, Hanmo Li

    Abstract: We introduce Gaussian orthogonal latent factor processes for modeling and predicting large correlated data. To handle the computational challenge, we first decompose the likelihood function of the Gaussian random field with a multi-dimensional input domain into a product of densities at the orthogonal components with lower-dimensional inputs. The continuous-time Kalman filter is implemented to c… ▽ More

    Submitted 26 November, 2021; v1 submitted 21 November, 2020; originally announced November 2020.

  15. arXiv:2010.11514  [pdf, ps, other

    stat.AP

    Robust estimation of SARS-CoV-2 epidemic in US counties

    Authors: Hanmo Li, Mengyang Gu

    Abstract: The COVID-19 outbreak is asynchronous in US counties. Mitigating the COVID-19 transmission requires not only the state and federal level order of protective measures such as social distancing and testing, but also public awareness of time-dependent risk and reactions at county and community levels. We propose a robust approach to estimate the heterogeneous progression of SARS-CoV-2 at all US count… ▽ More

    Submitted 29 April, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

  16. arXiv:2010.05942  [pdf, other

    cs.CE physics.comp-ph stat.AP

    Emulating the First Principles of Matter: A Probabilistic Roadmap

    Authors: Jianzhong Wu, Mengyang Gu

    Abstract: This chapter provides a tutorial overview of first principles methods to describe the properties of matter at the ground state or equilibrium. It begins with a brief introduction to quantum and statistical mechanics for predicting the electronic structure and diverse static properties of of many-particle systems useful for practical applications. Pedagogical examples are given to illustrate the ba… ▽ More

    Submitted 30 September, 2020; originally announced October 2020.

  17. arXiv:2006.15416  [pdf, other

    cond-mat.stat-mech cs.LG math.DS nlin.AO stat.ML

    Thermodynamic Machine Learning through Maximum Work Production

    Authors: A. B. Boyd, J. P. Crutchfield, M. Gu

    Abstract: Adaptive systems -- such as a biological organism gaining survival advantage, an autonomous robot executing a functional task, or a motor protein transporting intracellular nutrients -- must model the regularities and stochasticity in their environments to take full advantage of thermodynamic resources. Analogously, but in a purely computational realm, machine learning algorithms estimate models t… ▽ More

    Submitted 12 April, 2021; v1 submitted 27 June, 2020; originally announced June 2020.

    Comments: 29 pages, 10 figures, 6 appendices; http://csc.ucdavis.edu/~cmg/compmech/pubs/tml.htm

  18. arXiv:2005.06194  [pdf, other

    quant-ph cs.LG stat.ML

    Boosting on the shoulders of giants in quantum device calibration

    Authors: Alex Wozniakowski, Jayne Thompson, Mile Gu, Felix Binder

    Abstract: Traditional machine learning applications, such as optical character recognition, arose from the inability to explicitly program a computer to perform a routine task. In this context, learning algorithms usually derive a model exclusively from the evidence present in a massive dataset. Yet in some scientific disciplines, obtaining an abundance of data is an impractical luxury, however; there is an… ▽ More

    Submitted 13 May, 2020; originally announced May 2020.

  19. arXiv:1909.06297  [pdf, other

    cs.LG stat.ML

    Fast Low-rank Metric Learning for Large-scale and High-dimensional Data

    Authors: Han Liu, Zhizhong Han, Yu-Shen Liu, Ming Gu

    Abstract: Low-rank metric learning aims to learn better discrimination of data subject to low-rank constraints. It keeps the intrinsic low-rank structure of datasets and reduces the time cost and memory usage in metric learning. However, it is still a challenge for current methods to handle datasets with both high dimensions and large numbers of samples. To address this issue, we present a novel fast low-ra… ▽ More

    Submitted 13 September, 2019; originally announced September 2019.

    Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)

  20. arXiv:1906.00229  [pdf, ps, other

    cs.LG stat.ML

    Variational Langevin Hamiltonian Monte Carlo for Distant Multi-modal Sampling

    Authors: Minghao Gu, Shiliang Sun

    Abstract: The Hamiltonian Monte Carlo (HMC) sampling algorithm exploits Hamiltonian dynamics to construct efficient Markov Chain Monte Carlo (MCMC), which has become increasingly popular in machine learning and statistics. Since HMC uses the gradient information of the target distribution, it can explore the state space much more efficiently than the random-walk proposals. However, probabilistic inference i… ▽ More

    Submitted 1 June, 2019; originally announced June 2019.

  21. arXiv:1812.09738  [pdf, other

    quant-ph cond-mat.stat-mech stat.CO

    Surveying structural complexity in quantum many-body systems

    Authors: Whei Yeap Suen, Thomas J. Elliott, Jayne Thompson, Andrew J. P. Garner, John R. Mahoney, Vlatko Vedral, Mile Gu

    Abstract: Quantum many-body systems exhibit a rich and diverse range of exotic behaviours, owing to their underlying non-classical structure. These systems present a deep structure beyond those that can be captured by measures of correlation and entanglement alone. Using tools from complexity science, we characterise such structure. We investigate the structural complexities that can be found within the pat… ▽ More

    Submitted 18 March, 2022; v1 submitted 23 December, 2018; originally announced December 2018.

    Comments: 9 pages, 5 figures

  22. Calibration of imperfect geophysical models by multiple satellite interferograms with measurement bias

    Authors: Mengyang Gu, Kyle Anderson, Erika McPhillips

    Abstract: Model calibration consists of using experimental or field data to estimate the unknown parameters of a mathematical model. The presence of model discrepancy and measurement bias in the data complicates this task. Satellite interferograms, for instance, are widely used for calibrating geophysical models in geological hazard quantification. In this work, we used satellite interferograms to relate gr… ▽ More

    Submitted 25 February, 2023; v1 submitted 27 October, 2018; originally announced October 2018.

  23. arXiv:1808.10868  [pdf, other

    stat.ME

    Generalized probabilistic principal component analysis of correlated data

    Authors: Mengyang Gu, Weining Shen

    Abstract: Principal component analysis (PCA) is a well-established tool in machine learning and data processing. The principal axes in PCA were shown to be equivalent to the maximum marginal likelihood estimator of the factor loading matrix in a latent factor model for the observed data, assuming that the latent factors are independently distributed as standard normal distributions. However, the independenc… ▽ More

    Submitted 23 October, 2019; v1 submitted 31 August, 2018; originally announced August 2018.

  24. arXiv:1807.10840  [pdf, other

    stat.AP

    Nonparametric estimation of utility functions

    Authors: Mengyang Gu, Debarun Bhattacharjya, Dharmashankar Subramanian

    Abstract: Inferring a decision maker's utility function typically involves an elicitation phase where the decision maker responds to a series of elicitation queries, followed by an estimation phase where the state-of-the-art is to either fit the response data to a parametric form (such as the exponential or power function) or perform linear interpolation. We introduce a Bayesian nonparametric method involvi… ▽ More

    Submitted 27 July, 2018; originally announced July 2018.

  25. arXiv:1804.09329  [pdf, other

    stat.ME

    Jointly Robust Prior for Gaussian Stochastic Process in Emulation, Calibration and Variable Selection

    Authors: Mengyang Gu

    Abstract: Gaussian stochastic process (GaSP) has been widely used in two fundamental problems in uncertainty quantification, namely the emulation and calibration of mathematical models. Some objective priors, such as the reference prior, are studied in the context of emulating (approximating) computationally expensive mathematical models. In this work, we introduce a new class of priors, called the jointly… ▽ More

    Submitted 7 September, 2018; v1 submitted 24 April, 2018; originally announced April 2018.

  26. arXiv:1801.01874  [pdf, other

    stat.CO

    RobustGaSP: Robust Gaussian Stochastic Process Emulation in R

    Authors: Mengyang Gu, Jesús Palomo, James O. Berger

    Abstract: Gaussian stochastic process emulation is a powerful tool for approximating computationally intensive computer models. However, estimation of parameters in the GaSP emulator is a challenging task. No closed-form estimator is available and many numerical problems arise with standard estimates, e.g., the maximum likelihood estimator. In this package, we implement a marginal posterior mode estimator,… ▽ More

    Submitted 14 June, 2019; v1 submitted 5 January, 2018; originally announced January 2018.

  27. Fast Nonseparable Gaussian Stochastic Process with Application to Methylation Level Interpolation

    Authors: Mengyang Gu, Yanxun Xu

    Abstract: Gaussian stochastic process (GaSP) has been widely used as a prior over functions due to its flexibility and tractability in modeling. However, the computational cost in evaluating the likelihood is $O(n^3)$, where $n$ is the number of observed points in the process, as it requires to invert the covariance matrix. This bottleneck prevents GaSP being widely used in large-scale data. We propose a ge… ▽ More

    Submitted 22 November, 2021; v1 submitted 30 November, 2017; originally announced November 2017.

    Comments: Published version of the paper. The typos in the joint distribution in supplementary materials are corrected

    Journal ref: Journal of Computational and Graphical Statistics, 29:2, 250-260 (2020)

  28. arXiv:1707.08215  [pdf, other

    stat.ME

    Scaled Gaussian Stochastic Process for Computer Model Calibration and Prediction

    Authors: Mengyang Gu, Long Wang

    Abstract: We consider the problem of calibrating an imperfect computer model using experimental data. To compensate the misspecification of the computer model and make more accurate predictions, a discrepancy function is often included and modeled via a Gaussian stochastic process (GaSP). The calibrated computer model alone, however, sometimes fits the experimental data poorly, as the calibration parameters… ▽ More

    Submitted 3 May, 2018; v1 submitted 25 July, 2017; originally announced July 2017.

  29. arXiv:1511.08528  [pdf, other

    math.NA stat.CO

    Gaussian Elimination with Randomized Complete Pivoting

    Authors: Christopher Melgaard, Ming Gu

    Abstract: Gaussian elimination with partial pivoting (GEPP) has long been among the most widely used methods for computing the LU factorization of a given matrix. However, this method is also known to fail for matrices that induce large element growth during the factorization process. In this paper, we propose a new scheme, Gaussian elimination with randomized complete pivoting (GERCP) for the efficient and… ▽ More

    Submitted 26 November, 2015; originally announced November 2015.

    Comments: 5 figures, 33 pages

    MSC Class: 60; 65