Skip to main content

Showing 1–50 of 91 results for author: Wong, K

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.15170  [pdf, other

    stat.ME

    Inference for Delay Differential Equations Using Manifold-Constrained Gaussian Processes

    Authors: Yuxuan Zhao, Samuel W. K. Wong

    Abstract: Dynamic systems described by differential equations often involve feedback among system components. When there are time delays for components to sense and respond to feedback, delay differential equation (DDE) models are commonly used. This paper considers the problem of inferring unknown system parameters, including the time delays, from noisy and sparse experimental data observed from the system… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 42 pages, 8 figures

  2. arXiv:2405.12386  [pdf, other

    stat.ML cs.LG stat.AP stat.CO

    Particle swarm optimization with Applications to Maximum Likelihood Estimation and Penalized Negative Binomial Regression

    Authors: Sisi Shao, Junhyung Park, Weng Kee Wong

    Abstract: General purpose optimization routines such as nlminb, optim (R) or nlmixed (SAS) are frequently used to estimate model parameters in nonstandard distributions. This paper presents Particle Swarm Optimization (PSO), as an alternative to many of the current algorithms used in statistics. We find that PSO can not only reproduce the same results as the above routines, it can also produce results that… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  3. arXiv:2402.08873  [pdf, ps, other

    stat.ME

    Balancing Method for Non-monotone Missing Data

    Authors: Jianing Dong, Raymond K. W. Wong, Kwun Chuen Gary Chan

    Abstract: Covariate balancing methods have been widely applied to single or monotone missing patterns and have certain advantages over likelihood-based methods and inverse probability weighting approaches based on standard logistic regression. In this paper, we consider non-monotone missing data under the complete-case missing variable condition (CCMV), which is a case of missing not at random (MNAR). Using… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  4. arXiv:2402.06058  [pdf, other

    stat.ME

    Mathematical programming tools for randomization purposes in small two-arm clinical trials: A case study with real data

    Authors: Alan R. Vazquez, Weng Kee Wong

    Abstract: Modern randomization methods in clinical trials are invariably adaptive, meaning that the assignment of the next subject to a treatment group uses the accumulated information in the trial. Some of the recent adaptive randomization methods use mathematical programming to construct attractive clinical trials that balance the group features, such as their sizes and covariate distributions of their su… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 36 pages, 12 figures

  5. arXiv:2402.01900  [pdf, other

    stat.ML cs.LG

    Distributional Off-policy Evaluation with Bellman Residual Minimization

    Authors: Sungee Hong, Zhengling Qi, Raymond K. W. Wong

    Abstract: We consider the problem of distributional off-policy evaluation which serves as the foundation of many distributional reinforcement learning (DRL) algorithms. In contrast to most existing works (that rely on supremum-extended statistical distances such as supremum-Wasserstein distance), we study the expectation-extended statistical distance for quantifying the distributional Bellman residuals and… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  6. arXiv:2401.10010  [pdf, ps, other

    stat.ME

    A global kernel estimator for partially linear varying coefficient additive hazards models

    Authors: Hoi Min Ng, Kin Yau Wong

    Abstract: In biomedical studies, we are often interested in the association between different types of covariates and the times to disease events. Because the relationship between the covariates and event times is often complex, standard survival models that assume a linear covariate effect are inadequate. A flexible class of models for capturing complex interaction effects among types of covariates is the… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: 27 pages

    MSC Class: 62N02

  7. arXiv:2401.04723  [pdf, other

    stat.ME

    Spatio-temporal data fusion for the analysis of in situ and remote sensing data using the INLA-SPDE approach

    Authors: Shiyu He, Samuel W. K. Wong

    Abstract: We propose a Bayesian hierarchical model to address the challenge of spatial misalignment in spatio-temporal data obtained from in situ and satellite sources. The model is fit using the INLA-SPDE approach, which provides efficient computation. Our methodology combines the different data sources in a "fusion"" model via the construction of projection matrices in both spatial and temporal domains. T… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 23 pages, 7 figures

  8. arXiv:2312.13044  [pdf, other

    stat.ME stat.CO

    Particle Gibbs for Likelihood-Free Inference of State Space Models with Application to Stochastic Volatility

    Authors: Zhaoran Hou, Samuel W. K. Wong

    Abstract: State space models (SSMs) are widely used to describe dynamic systems. However, when the likelihood of the observations is intractable, parameter inference for SSMs cannot be easily carried out using standard Markov chain Monte Carlo or sequential Monte Carlo methods. In this paper, we propose a particle Gibbs sampler as a general strategy to handle SSMs with intractable likelihoods in the approxi… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 23 pages

  9. arXiv:2311.03497  [pdf, other

    stat.AP

    Understanding the Impact of Seasonal Climate Change on Canada's Economy by Region and Sector

    Authors: Shiyu He, Trang Bui, Yuying Huang, Wenling Zhang, Jie Jian, Samuel W. K. Wong, Tony S. Wirjanto

    Abstract: To assess the impact of climate change on the Canadian economy, we investigate and model the relationship between seasonal climate variables and economic growth across provinces and economic sectors. We further provide projections of climate change impacts up to the year 2050, taking into account the diverse climate change patterns and economic conditions across Canada. Our results indicate that r… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: 25 pages, 7 figures

  10. arXiv:2310.20537  [pdf, other

    stat.ME stat.ML

    Directed Cyclic Graph for Causal Discovery from Multivariate Functional Data

    Authors: Saptarshi Roy, Raymond K. W. Wong, Yang Ni

    Abstract: Discovering causal relationship using multivariate functional data has received a significant amount of attention very recently. In this article, we introduce a functional linear structural equation model for causal structure learning when the underlying graph involving the multivariate functions may have cycles. To enhance interpretability, our model involves a low-dimensional causal embedded spa… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: 36 pages, 2 figures, 7 tables

  11. arXiv:2310.15070  [pdf, ps, other

    stat.ME

    Improving estimation efficiency of case-cohort study with interval-censored failure time data

    Authors: Qingning Zhou, Kin Yau Wong

    Abstract: The case-cohort design is a commonly used cost-effective sampling strategy for large cohort studies, where some covariates are expensive to measure or obtain. In this paper, we consider regression analysis under a case-cohort study with interval-censored failure time data, where the failure time is only known to fall within an interval instead of being exactly observed. A common approach to analyz… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 19 pages, 3 tables

  12. arXiv:2310.07801  [pdf, other

    cs.CV cs.AI stat.ME

    Trajectory-aware Principal Manifold Framework for Data Augmentation and Image Generation

    Authors: Elvis Han Cui, Bingbin Li, Yanan Li, Weng Kee Wong, Donghui Wang

    Abstract: Data augmentation for deep learning benefits model training, image transformation, medical imaging analysis and many other fields. Many existing methods generate new samples from a parametric distribution, like the Gaussian, with little attention to generate samples along the data manifold in either the input or feature space. In this paper, we verify that there are theoretical and practical advan… ▽ More

    Submitted 30 July, 2023; originally announced October 2023.

    Comments: 20 figures

  13. arXiv:2309.08039  [pdf, other

    stat.ME math.ST

    Flexible Functional Treatment Effect Estimation

    Authors: Jiayi Wang, Raymond K. W. Wong, Xiaoke Zhang, Kwun Chuen Gary Chan

    Abstract: We study treatment effect estimation with functional treatments where the average potential outcome functional is a function of functions, in contrast to continuous treatment effect estimation where the target is a function of real numbers. By considering a flexible scalar-on-function marginal structural model, a weight-modified kernel ridge regression (WMKRR) is adopted for estimation. The weight… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  14. arXiv:2306.16909  [pdf, other

    stat.ME

    A network-based regression approach for identifying subject-specific driver mutations

    Authors: Kin Yau Wong, Donglin Zeng, D. Y. Lin

    Abstract: In cancer genomics, it is of great importance to distinguish driver mutations, which contribute to cancer progression, from causally neutral passenger mutations. We propose a random-effect regression approach to estimate the effects of mutations on the expressions of genes in tumor samples, where the estimation is assisted by a prespecified gene network. The model allows the mutation effects to va… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: 23 pages; 9 figures

  15. arXiv:2304.02127  [pdf, other

    stat.ME

    A Bayesian Collocation Integral Method for Parameter Estimation in Ordinary Differential Equations

    Authors: Mingwei Xu, Samuel W. K. Wong, Peijun Sang

    Abstract: Inferring the parameters of ordinary differential equations (ODEs) from noisy observations is an important problem in many scientific fields. Currently, most parameter estimation methods that bypass numerical integration tend to rely on basis functions or Gaussian processes to approximate the ODE solution and its derivatives. Due to the sensitivity of the ODE solution to its derivatives, these met… ▽ More

    Submitted 23 October, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

  16. Bayesian Nonlinear Tensor Regression with Functional Fused Elastic Net Prior

    Authors: Shuoli Chen, Kejun He, Shiyuan He, Yang Ni, Raymond K. W. Wong

    Abstract: Tensor regression methods have been widely used to predict a scalar response from covariates in the form of a multiway array. In many applications, the regions of tensor covariates used for prediction are often spatially connected with unknown shapes and discontinuous jumps on the boundaries. Moreover, the relationship between the response and the tensor covariates can be nonlinear. In this articl… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

    Journal ref: Technometrics, 65:4, 524-536 (2023)

  17. arXiv:2301.12540  [pdf, other

    stat.ML cs.LG

    Implicit Regularization for Group Sparsity

    Authors: Jiangyuan Li, Thanh V. Nguyen, Chinmay Hegde, Raymond K. W. Wong

    Abstract: We study the implicit regularization of gradient descent towards structured sparsity via a novel neural reparameterization, which we call a diagonally grouped linear neural network. We show the following intriguing property of our reparameterization: gradient descent over the squared regression loss, without any explicit regularization, biases towards solutions with a group sparsity structure. In… ▽ More

    Submitted 29 January, 2023; originally announced January 2023.

    Comments: accepted by ICLR 2023

  18. arXiv:2301.12302  [pdf, other

    stat.AP

    A Kriging Metamodel with Adaptive Sampling for Seismic Evaluation of Podium Buildings

    Authors: Yuying Huang, Zhiyong Chen, Samuel W. K. Wong

    Abstract: In this paper, nonlinear time-history dynamic analyses of selected earthquake ground motions are conducted on designated wood-frame podium buildings and the resulting inter-story drifts are analyzed. We aim to construct a reliable region where performance-based seismic design criteria are met, such that a two-step analysis procedure can be used with high confidence. We develop a kriging metamodel… ▽ More

    Submitted 28 January, 2023; originally announced January 2023.

    Comments: 14 pages, 2 figures

  19. arXiv:2210.14216  [pdf, other

    stat.ME

    Estimating Boltzmann Averages for Protein Structural Quantities Using Sequential Monte Carlo

    Authors: Zhaoran Hou, Samuel W. K. Wong

    Abstract: Sequential Monte Carlo (SMC) methods are widely used to draw samples from intractable target distributions. Particle degeneracy can hinder the use of SMC when the target distribution is highly constrained or multimodal. As a motivating application, we consider the problem of sampling protein structures from the Boltzmann distribution. This paper proposes a general SMC method that propagates multip… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: 20 pages

  20. arXiv:2210.13323  [pdf, other

    q-bio.PE stat.AP

    A Comparative Study of Compartmental Models for COVID-19 Transmission in Ontario, Canada

    Authors: Yuxuan Zhao, Samuel W. K. Wong

    Abstract: The number of confirmed COVID-19 cases reached over 1.3 million in Ontario, Canada by June 4, 2022. The continued spread of the virus underlying COVID-19 has been spurred by the emergence of variants since the initial outbreak in December, 2019. Much attention has thus been devoted to tracking and modelling the transmission of COVID-19. Compartmental models are commonly used to mimic epidemic tran… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: 26 pages, 8 figures

  21. arXiv:2206.12891  [pdf, other

    stat.ME

    Hierarchical nuclear norm penalization for multi-view data

    Authors: Sangyoon Yi, Raymond K. W. Wong, Irina Gaynanova

    Abstract: The prevalence of data collected on the same set of samples from multiple sources (i.e., multi-view data) has prompted significant development of data integration methods based on low-rank matrix factorizations. These methods decompose signal matrices from each view into the sum of shared and individual structures, which are further used for dimension reduction, exploratory analyses, and quantifyi… ▽ More

    Submitted 26 June, 2022; originally announced June 2022.

    Comments: 39 pages, 10 figures, 3 tables

  22. arXiv:2203.12913  [pdf, other

    cs.AI stat.ML

    k-Rater Reliability: The Correct Unit of Reliability for Aggregated Human Annotations

    Authors: Ka Wong, Praveen Paritosh

    Abstract: Since the inception of crowdsourcing, aggregation has been a common strategy for dealing with unreliable data. Aggregate ratings are more reliable than individual ones. However, many natural language processing (NLP) applications that rely on aggregate ratings only report the reliability of individual ratings, which is the incorrect unit of analysis. In these instances, the data reliability is und… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

  23. arXiv:2203.06066  [pdf, other

    stat.CO

    MAGI: A Package for Inference of Dynamic Systems from Noisy and Sparse Data via Manifold-constrained Gaussian Processes

    Authors: Samuel W. K. Wong, Shihao Yang, S. C. Kou

    Abstract: This article presents the MAGI software package for the inference of dynamic systems. The focus of MAGI is on dynamics modeled by nonlinear ordinary differential equations with unknown parameters. While such models are widely used in science and engineering, the available experimental data for parameter estimation may be noisy and sparse. Furthermore, some system components may be entirely unobser… ▽ More

    Submitted 16 October, 2023; v1 submitted 11 March, 2022; originally announced March 2022.

    Comments: 47 pages, 10 figures

  24. arXiv:2201.07775  [pdf, other

    stat.AP q-bio.BM

    Monte Carlo sampling of flexible protein structures: an application to the SARS-CoV-2 omicron variant

    Authors: Samuel W. K. Wong

    Abstract: Proteins can exhibit dynamic structural flexibility as they carry out their functions, especially in binding regions that interact with other molecules. For the key SARS-CoV-2 spike protein that facilitates COVID-19 infection, studies have previously identified several such highly flexible regions with therapeutic importance. However, protein structures available from the Protein Data Bank are pre… ▽ More

    Submitted 4 February, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

    Comments: 20 pages, 4 figures

  25. arXiv:2201.03464  [pdf, other

    stat.AP

    Knots and their effect on the tensile strength of lumber: a case study

    Authors: Shuxian Fan, Samuel W. K. Wong, James V. Zidek

    Abstract: When assessing the strength of sawn lumber for use in engineering applications, the sizes and locations of knots are an important consideration. Knots are the most common visual characteristics of lumber, that result from the growth of tree branches. Large individual knots, as well as clusters of distinct knots, are known to have strength-reducing effects. However, industry grading rules that gove… ▽ More

    Submitted 14 February, 2023; v1 submitted 10 January, 2022; originally announced January 2022.

    Comments: 20 pages, 4 figures

  26. arXiv:2111.14623  [pdf, other

    cs.LG cs.CY stat.AP

    An Overview of Healthcare Data Analytics With Applications to the COVID-19 Pandemic

    Authors: Zhe Fei, Yevgen Ryeznik, Oleksandr Sverdlov, Chee Wei Tan, Weng Kee Wong

    Abstract: In the era of big data, standard analysis tools may be inadequate for making inference and there is a growing need for more efficient and innovative ways to collect, process, analyze and interpret the massive and complex data. We provide an overview of challenges in big data problems and describe how innovative analytical methods, machine learning tools and metaheuristics can tackle general health… ▽ More

    Submitted 25 November, 2021; originally announced November 2021.

    Journal ref: IEEE TRANSACTIONS ON BIG DATA, 12 August 2021

  27. arXiv:2110.11896  [pdf, other

    stat.AP

    Multimodel Bayesian Analysis of Load Duration Effects in Lumber Reliability

    Authors: Yunfeng Yang, Martin Lysy, Samuel W. K. Wong

    Abstract: This paper evaluates the reliability of lumber, accounting for the duration-of-load (DOL) effect under different load profiles based on a multimodel Bayesian approach. Three individual DOL models previously used for reliability assessment are considered: the US model, the Canadian model, and the Gamma process model. Procedures for stochastic generation of residential, snow, and wind loads are also… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

    Comments: 15 pages, 2 figures

  28. Evaluating the Impact of State-Level Public Masking Mandates on New COVID-19 Cases and Deaths in the United States: A Demonstration of the Causal Roadmap

    Authors: Angus K. Wong, Laura B. Balzer

    Abstract: At a national-level, we sought to investigate the effect of public masking mandates on COVID-19 in Fall 2020. Specifically, we aimed to evaluate how the relative growth of COVID-19 cases and deaths would have differed if all states had issued a mandate to mask in public by September 1, 2020 versus if all states had delayed issuing such a mandate. To do so, we applied the Causal Roadmap, a formal f… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: 34 total page (including supp materials)

    Journal ref: Epidemiology, December 8, 2021

  29. arXiv:2109.04640  [pdf, other

    cs.LG stat.ME

    Projected State-action Balancing Weights for Offline Reinforcement Learning

    Authors: Jiayi Wang, Zhengling Qi, Raymond K. W. Wong

    Abstract: Offline policy evaluation (OPE) is considered a fundamental and challenging problem in reinforcement learning (RL). This paper focuses on the value estimation of a target policy based on pre-collected data generated from a possibly different policy, under the framework of infinite-horizon Markov decision processes. Motivated by the recently developed marginal importance sampling method in RL and t… ▽ More

    Submitted 9 June, 2022; v1 submitted 9 September, 2021; originally announced September 2021.

  30. arXiv:2108.05574  [pdf, other

    stat.ML cs.LG

    Implicit Sparse Regularization: The Impact of Depth and Early Stop**

    Authors: Jiangyuan Li, Thanh V. Nguyen, Chinmay Hegde, Raymond K. W. Wong

    Abstract: In this paper, we study the implicit bias of gradient descent for sparse regression. We extend results on regression with quadratic parametrization, which amounts to depth-2 diagonal linear networks, to more general depth-N networks, under more realistic settings of noise and correlated designs. We show that early stop** is crucial for gradient descent to converge to a sparse model, a phenomenon… ▽ More

    Submitted 26 October, 2021; v1 submitted 12 August, 2021; originally announced August 2021.

    Comments: 32 pages, accepted by NeurIPS 2021. arXiv admin note: text overlap with arXiv:1909.05122 by other authors

  31. arXiv:2106.07393  [pdf, other

    stat.AP cs.AI cs.SI

    Cross-replication Reliability -- An Empirical Approach to Interpreting Inter-rater Reliability

    Authors: Ka Wong, Praveen Paritosh, Lora Aroyo

    Abstract: We present a new approach to interpreting IRR that is empirical and contextualized. It is based upon benchmarking IRR against baseline measures in a replication, one of which is a novel cross-replication reliability (xRR) measure based on Cohen's kappa. We call this approach the xRR framework. We opensource a replication dataset of 4 million human judgements of facial expressions and analyze it wi… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

  32. arXiv:2106.05850  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Matrix Completion with Model-free Weighting

    Authors: Jiayi Wang, Raymond K. W. Wong, Xiaojun Mao, Kwun Chuen Gary Chan

    Abstract: In this paper, we propose a novel method for matrix completion under general non-uniform missing structures. By controlling an upper bound of a novel balancing error, we construct weights that can actively adjust for the non-uniformity in the empirical risk without explicitly modeling the observation probabilities, and can be computed efficiently via convex optimization. The recovered matrix based… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

  33. arXiv:2105.14647  [pdf, ps, other

    stat.ME

    Orthogonal Subsampling for Big Data Linear Regression

    Authors: Lin Wang, Jake Elmstedt, Weng Kee Wong, Hongquan Xu

    Abstract: The dramatic growth of big datasets presents a new challenge to data storage and analysis. Data reduction, or subsampling, that extracts useful information from datasets is a crucial step in big data analysis. We propose an orthogonal subsampling (OSS) approach for big data with a focus on linear regression models. The approach is inspired by the fact that an orthogonal array of two levels provide… ▽ More

    Submitted 30 May, 2021; originally announced May 2021.

  34. arXiv:2105.08835  [pdf, ps, other

    q-bio.BM stat.AP

    Conformational variability of loops in the SARS-CoV-2 spike protein

    Authors: Samuel W. K. Wong, Zongjun Liu

    Abstract: The SARS-CoV-2 spike (S) protein facilitates viral infection, and has been the focus of many structure determination efforts. Its flexible loop regions are known to be involved in protein binding and may adopt multiple conformations. This paper identifies the S protein loops and studies their conformational variability based on the available Protein Data Bank (PDB) structures. While most loops had… ▽ More

    Submitted 13 October, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

    Comments: 24 pages

  35. arXiv:2104.10878  [pdf, other

    stat.AP q-bio.PE

    Comparing regional and provincial-wide COVID-19 models with physical distancing in British Columbia

    Authors: Geoffrey McGregor, Jennifer Tippett, Andy T. S. Wan, Mengxiao Wang, Samuel W. K. Wong

    Abstract: We study the effects of physical distancing measures for the spread of COVID-19 in regional areas within British Columbia, using the reported cases of the five provincial Health Authorities. Building on the Bayesian epidemiological model of Anderson et al. (2020), we propose a hierarchical regional Bayesian model with time-varying regional parameters between March to December of 2020. In the absen… ▽ More

    Submitted 13 November, 2021; v1 submitted 22 April, 2021; originally announced April 2021.

    Comments: 35 pages, 16 figures

    Journal ref: AIMS Mathematics, 2022, 7(4): 6743-6778

  36. arXiv:2104.10041  [pdf, other

    cs.NE cs.AI stat.AP stat.CO

    Particle swarm optimization in constrained maximum likelihood estimation a case study

    Authors: Elvis Cui, Dongyuan Song, Weng Kee Wong

    Abstract: The aim of paper is to apply two types of particle swarm optimization, global best andlocal best PSO to a constrained maximum likelihood estimation problem in pseudotime anal-ysis, a sub-field in bioinformatics. The results have shown that particle swarm optimizationis extremely useful and efficient when the optimization problem is non-differentiable and non-convex so that analytical solution can… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

    Comments: 11 pages, 7 figures

  37. arXiv:2103.03437  [pdf, other

    stat.ME

    Estimation of Partially Conditional Average Treatment Effect by Hybrid Kernel-covariate Balancing

    Authors: Jiayi Wang, Raymond K. W. Wong, Shu Yang, Kwun Chuen Gary Chan

    Abstract: We study nonparametric estimation for the partially conditional average treatment effect, defined as the treatment effect function over an interested subset of confounders. We propose a hybrid kernel weighting estimator where the weights aim to control the balancing error of any function of the confounders from a reproducing kernel Hilbert space after kernel smoothing over the subset of interested… ▽ More

    Submitted 4 March, 2021; originally announced March 2021.

    Comments: 19 pages, 2 figures

  38. arXiv:2101.02304  [pdf, other

    stat.AP q-bio.BM

    Statistical challenges in the analysis of sequence and structure data for the COVID-19 spike protein

    Authors: Shiyu He, Samuel W. K. Wong

    Abstract: As the major target of many vaccines and neutralizing antibodies against SARS-CoV-2, the spike (S) protein is observed to mutate over time. In this paper, we present statistical approaches to tackle some challenges associated with the analysis of S-protein data. We build a Bayesian hierarchical model to study the temporal and spatial evolution of S-protein sequences, after grou** the sequences i… ▽ More

    Submitted 30 January, 2021; v1 submitted 6 January, 2021; originally announced January 2021.

    Comments: 21 pages, 5 figures

  39. arXiv:2011.00442  [pdf, other

    stat.ME

    Penalized estimation for single-index varying-coefficient models with applications to integrative genomic analysis

    Authors: Hoi Min Ng, Binyan Jiang, Kin Yau Wong

    Abstract: Recent technological advances have made it possible to collect high-dimensional genomic data along with clinical data on a large number of subjects. In the studies of chronic diseases such as cancer, it is of great interest to integrate clinical and genomic data to build a comprehensive understanding of the disease mechanisms. Despite extensive studies on integrative analysis, it remains an ongoin… ▽ More

    Submitted 1 November, 2020; originally announced November 2020.

    Comments: 18 pages, 8 figures

  40. arXiv:2010.13568  [pdf, other

    stat.ML cs.LG stat.ME

    CP Degeneracy in Tensor Regression

    Authors: Ya Zhou, Raymond K. W. Wong, Kejun He

    Abstract: Tensor linear regression is an important and useful tool for analyzing tensor data. To deal with high dimensionality, CANDECOMP/PARAFAC (CP) low-rank constraints are often imposed on the coefficient tensor parameter in the (penalized) $M$-estimation. However, we show that the corresponding optimization may not be attainable, and when this happens, the estimator is not well-defined. This is closely… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

    Journal ref: IEEE Access, 9:1, 7775-7788 (2021)

  41. arXiv:2009.11452  [pdf, ps, other

    stat.ME stat.AP

    A Wavelet-Based Independence Test for Functional Data with an Application to MEG Functional Connectivity

    Authors: Rui Miao, Xiaoke Zhang, Raymond K. W. Wong

    Abstract: Measuring and testing the dependency between multiple random functions is often an important task in functional data analysis. In the literature, a model-based method relies on a model which is subject to the risk of model misspecification, while a model-free method only provides a correlation measure which is inadequate to test independence. In this paper, we adopt the Hilbert-Schmidt Independenc… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

  42. Inference of dynamic systems from noisy and sparse data via manifold-constrained Gaussian processes

    Authors: Shihao Yang, Samuel W. K. Wong, S. C. Kou

    Abstract: Parameter estimation for nonlinear dynamic system models, represented by ordinary differential equations (ODEs), using noisy and sparse data is a vital task in many fields. We propose a fast and accurate method, MAGI (MAnifold-constrained Gaussian process Inference), for this task. MAGI uses a Gaussian process model over time-series data, explicitly conditioned on the manifold constraint that deri… ▽ More

    Submitted 21 February, 2021; v1 submitted 15 September, 2020; originally announced September 2020.

  43. Broadcasted Nonparametric Tensor Regression

    Authors: Ya Zhou, Raymond K. W. Wong, Kejun He

    Abstract: We propose a novel use of a broadcasting operation, which distributes univariate functions to all entries of the tensor covariate, to model the nonlinearity in tensor regression nonparametrically. A penalized estimation and the corresponding algorithm are proposed. Our theoretical investigation, which allows the dimensions of the tensor covariate to diverge, indicates that the proposed estimation… ▽ More

    Submitted 23 March, 2024; v1 submitted 29 August, 2020; originally announced August 2020.

  44. Low-Rank Covariance Function Estimation for Multidimensional Functional Data

    Authors: Jiayi Wang, Raymond K. W. Wong, Xiaoke Zhang

    Abstract: Multidimensional function data arise from many fields nowadays. The covariance function plays an important role in the analysis of such increasingly common data. In this paper, we propose a novel nonparametric covariance function estimation approach under the framework of reproducing kernel Hilbert spaces (RKHS) that can handle both sparse and dense functional data. We extend multilinear rank stru… ▽ More

    Submitted 29 August, 2020; originally announced August 2020.

    Comments: 25 pages, 4 figures

  45. arXiv:2006.10400  [pdf, other

    stat.ML cs.LG

    Median Matrix Completion: from Embarrassment to Optimality

    Authors: Weidong Liu, Xiaojun Mao, Raymond K. W. Wong

    Abstract: In this paper, we consider matrix completion with absolute deviation loss and obtain an estimator of the median matrix. Despite several appealing properties of median, the non-smooth absolute deviation loss leads to computational challenge for large-scale data sets which are increasingly common among matrix completion problems. A simple solution to large-scale problems is parallel computing. Howev… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: 26 pages, 1 figure, 5 tables

  46. arXiv:2002.03537  [pdf, other

    stat.AP

    Calibrating wood products for load duration and rate: A statistical look at three damage models

    Authors: Samuel W. K. Wong

    Abstract: Lumber and wood-based products are versatile construction materials that are susceptible to weakening as a result of applied stresses. To assess the effects of load duration and rate, experiments have been carried out by applying preset load profiles to sample specimens. This paper studies these effects via a damage modeling approach, by considering three models in the literature: the Gerhards and… ▽ More

    Submitted 9 February, 2020; originally announced February 2020.

    Comments: 17 pages, 5 figures

  47. arXiv:2001.01006  [pdf, other

    q-bio.GN cs.LG q-bio.QM stat.ML

    Review of Single-cell RNA-seq Data Clustering for Cell Type Identification and Characterization

    Authors: Shixiong Zhang, Xiangtao Li, Qiuzhen Lin, Ka-Chun Wong

    Abstract: In recent years, the advances in single-cell RNA-seq techniques have enabled us to perform large-scale transcriptomic profiling at single-cell resolution in a high-throughput manner. Unsupervised learning such as data clustering has become the central component to identify and characterize novel cell types and gene expression patterns. In this study, we review the existing single-cell RNA-seq data… ▽ More

    Submitted 3 January, 2020; originally announced January 2020.

  48. arXiv:1911.11983  [pdf, ps, other

    cs.LG stat.ML

    Benefits of Jointly Training Autoencoders: An Improved Neural Tangent Kernel Analysis

    Authors: Thanh V. Nguyen, Raymond K. W. Wong, Chinmay Hegde

    Abstract: A remarkable recent discovery in machine learning has been that deep neural networks can achieve impressive performance (in terms of both lower training error and higher generalization capacity) in the regime where they are massively over-parameterized. Consequently, over the past year, the community has devoted growing interest in analyzing optimization and generalization properties of over-param… ▽ More

    Submitted 2 March, 2020; v1 submitted 27 November, 2019; originally announced November 2019.

    Comments: Added Sections 3.2 and 3.4 on inductive biases. Fixed an error in deriving the neural tangent kernel in Section 3.3

  49. arXiv:1910.02114  [pdf, other

    stat.ML cs.LG stat.AP

    A Comparison Study on Nonlinear Dimension Reduction Methods with Kernel Variations: Visualization, Optimization and Classification

    Authors: Katherine C. Kempfert, Yishi Wang, Cuixian Chen, Samuel W. K. Wong

    Abstract: Because of high dimensionality, correlation among covariates, and noise contained in data, dimension reduction (DR) techniques are often employed to the application of machine learning algorithms. Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and their kernel variants (KPCA, KLDA) are among the most popular DR methods. Recently, Supervised Kernel Principal Component Analy… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

  50. arXiv:1909.08182  [pdf

    cs.LG eess.SP stat.ML

    Predicting Electricity Consumption using Deep Recurrent Neural Networks

    Authors: Anupiya Nugaliyadde, Upeka Somaratne, Kok Wai Wong

    Abstract: Electricity consumption has increased exponentially during the past few decades. This increase is heavily burdening the electricity distributors. Therefore, predicting the future demand for electricity consumption will provide an upper hand to the electricity distributor. Predicting electricity consumption requires many parameters. The paper presents two approaches with one using a Recurrent Neura… ▽ More

    Submitted 17 September, 2019; originally announced September 2019.